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FOREWORD 


The Software Engineering Laboratory (SEL) is an organization 
sponsored by the National Aeronautics and Space Administra- 
tion/Goddard Space Flight Center (NASA/GSFC) and created for 
the purpose of investigating the effectiveness of software 
engineering technologies when applied to the development of 
applications software. The SEL was created in 1977 and has 
three primary organizational members: 

NASA/GSFC (Systems Development Branch) 

The University of Maryland (Computer Sciences Department) 

Computer Sciences Corporation (Systems Development 

Operation) 

The goals of the SEL are (1) to understand the software 
development process in the GSFC environment; (2) to measure 
the effect of various methodologies, tools, and models on 
this process; and (3) to identify and then to apply success- 
ful development practices. The activities, findings, and 
recommendations of the SEL are recorded in the Software 
Engineering Laboratory Series, a continuing series of 
reports that includes this document. The papers contained 
in this document appeared previously as indicated in each 
section. 

Single copies of this document can be obtained by writing to 

Systems Development Branch 

Code 552 

NASA/GSFC 

Greenbelt, Maryland 20771 
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SECTION 1 - INTRODUCTION 







SECTION 1 ~ INTRODUCTION 

This document is a collection 'of technical papers produced 
by participants in the Software Engineering Laboratory (SEL) 
during the period June 1, 1987, through January 1, 1989. 

The purpose of the document is to make available, in one 
reference, some results of SEL research that originally ap- 
peared in a number of different forums. This is the sixth 
such volume of technical papers produced by the SEL . ' A 1 

* ^ I 

though these papers cover several topics related tcrisoffcwate 

. - - - - . t ■ : . .. . t" : \ 

engineering, they do not encompass the entire scope bf SEL \ 
activities and interests. Additional information ^jbqut the^ 
SEL and its research efforts may be obtained from the sources 
listed in the bibliography at the end of this document. 

For the convenience of this presentation, the twelve papers 
contained here are grouped into three major categories: 

• U ' Software Measurement and Technology Studies^ 

• u Measurement Environment Studies, - -• - 

• *. ; Ada Technology Studies 

The first category presents experimental research and eval- 
uation of software measurement and technology; the second 
presents studies on software environments pertaining to 
measurement. The last category represents Ada technology 
and includes research, development, and measurement studies. 

The SEL is actively working to increase its understanding 
and to improve the software ..development process at Goddard 
Space Flight Center (GSFC) . Future effor ts will be docu- 
mented in additional volumes of the Collected Software Engi- 
neering Papers and other SEL publications. 
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SECTION 2 - SOFTWARE MEASUREMENT AND 
TECHNOLOGY STUDIES 






SECTION 2 - SOFTWARE MEASUREMENT AND TECHNOLOGY STUDIES 


The technical papers included in this section were originally 
prepared as indicated below. 

• "The Effectiveness of Software Prototyping: A Case 

Study," M. V. Zelkowitz, Proceedings of the 26th 
Annual Technical Symposium of the Washington. D.C. 
Chapter of the ACM . June 1987 

• "Measuring Software Design Complexity," D. N. Card 
and W. W. Agresti, The Journal of Systems and Soft- 
ware . June 1988 

• "Quantitative Assessment of Maintenance: An Indus- 

trial Case Study," H. D. Rombach and V. R. Basili, 
Proceedings from the Conference on Software Mainte- 
nance , September 1987 

• "Resource Utilization During Software Development," 
M. V. Zelkowitz, The Journal of Systems and Soft- 
ware , 1988 
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THE EFFECTIVENESS OF SOFTWARE PROTOTYPING: 

A Case Study 


Marvin V. Zelkowitz 

Department of Computer Science 
University of M&ryi&nd 
College Park, Maryland 20742 


ABSTRACT 

This paper discusses resource utilization over the life 
cycle of software development, and discusses the role 
that the current " waterfall model w plays in the actual 
software life cycle , The effects of prototyping are 
measured with respect to the life cycle model. 
Software production in the NASA environment was 
analysed to measure these differences. The data col- 
lected from thirteen different projects and one proto- 
type development were collected by the Software En- 
gineering Laboratory at NASA Goddard Space Flight 
Center and analyzed for similarities and differences. 
The results indicate that the waterfall model is not 
very realistic in practice, and that a prototype develop- 
ment follows a similar life cycle as a production 
system-although, for this prototype , issues like system 
design and the user interface look precedence over is- 
sues such as correctness and robustness of the result- 
ing system. 


KEYWORDS: Life cycle, Measurement, Prototyping, 
Resource utilization, Waterfall chart 

1. Introduction 

A$ technology impacts the way industry builds 
software, there is increasing interest in understanding the 
software development model and In measuring both the 
process and product. New workstation technology, new 
languages (e.g., Ada, requirements and specification 
languages) as well as new techniques (e.g., prototyping, 
pseudocode) are impacting how software is built which 
further impacts how management needs to address these 
concerns in controlling and monitoring a software 
development. 

In this paper, data are first presented which analyze 
several fairly large software projects from NASA God- 
dard Space Flight Center (GSFC) and put the current 
“waterfall" model in perspective. Data about software 
costs, productivity, reliability, modularity and other fac- 
tors are collected by the Software Engineering Labora- 
tory (SfcL), a research group consisting of individuals 

® 1&87 Association for Computing Machinery, Inc. 


P*nBi»Joa to copy without fee all oc part of this material is granted 
provided that the copies are not made or distributed for direct com- 
wtrial advantage, the ACM cop> right notice and the title of the 
Public* tion and its date appear, and notice is given that copying is 
■J permission of the Association for Computing Machinery. To 
«PM>therwise, or to republish, requires a fee and/or spec fie per- 


from NASA/CSFC, Computer Sciences Corporation, and 
the University of Maryland, for research on improving 
both the software product and the process for building 
such software [SEL 82], The Software Engineering 
Laboratory was established in 1976 to investigate the 
effectiveness of software engineering techniques for 
developing ground support software for NASA (BAS 78]. 
A recent prototyping experiment was conducted and data 
were collected which compare this prototype with the 
more traditional way to build software. The paper con- 
cludes with comments on the role of prototyping as a 
software development technique. 

The software development process is typically 
product-driven, and can be divided into six major life 
cycle activities, each associated with a specific “end pro- 
duct* [WAS 83, ZEL 78]: 

(1) Requirements phase and the publication of a 
requirements document. 

(2) Design phase and the creation of a design docu- 
ment. 

(3) Code and Unit Test phase and the generation of 
the source code library. 

(4) System integration and testing phase and the 
fulfillment of the test plan. 

(5) Acceptance test phase and completion of the 
acceptance test plan. 

(6) Operation and Maintenance phase and the 
delivery of the completed system. 

In order to present consistent data across a large number 
of projects, this paper only focuses on the interval 
between design and acceptance test and involves the 
actual implementation of the system by the developer 
group. 

In this paper, we will refer to the term activity as 
the work required to complete a specific task. For exam- 
ple, the coding activity refers to all work done in gen- 
erating the source code for a project, the design activity 
refers to building the program design, etc. On the other 
hand, the term phase will refer to that period of time 
when a certain activity is supposed to occur. For exam- 
ple, the Coding Phase will refer to that period of time 
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during a software development when coding activities are 
supposed to occur. It is closely related to management- 
defined milestone dates for a project. But during this 
period, other activities may also occur. 


REQUIREMENTS 

I I 

DESIGN 


I I 1 

| CODE 

I 

I I 1 

I 

| INTEGRATION 

! 

I 1 1 

i ACCEPTANCE TEST 


OPERATION 

I I 


Lifecycle Calendar Time — > 

Figure I. Typical Life Cycle 

The waterfall model makes the assumption that all 
activity of a certain type occurs during the phase of that 
same name and phases do not overlap. Once a phase 
ends, then the next phase begins. Thus ait requirements 
for a project occur during the Requirements Phase; all 
design activity occurs during the Design Phase. Once a 
project has a design review and enters the Coding Phase, 
then all activity is Coding. Since many companies keep 
data based upon hours worked by calendar date, this 
model is very easy to track. However, as Figure 1 shows, 
activities overlap and do not lie in separate phases. We 
will give more data on this later. 

2. The waterfall chart U all wet 

In the NASA/GSFC environment that we studied, 
the software life cycle follows a fairly standard set of 
activities [SEL 81]: 

The requirement! activity involves translating the 
functional specification consisting of physical attributes 
about the spacecraft to be launched into requirements for 
a software system that is to be built. 

The design activity can be divided Into two subac- 
tivities: the preliminary design activity and the detailed 
design activity. During preliminary design, the major 
subsystems are specified, input-output interfaces and 
implementation strategies are developed. During detailed 
design, the system architecture is extended to the subrou- 


tine and procedure level. Data structures and formal 
models of the system are defined. These models includ- 
procedural descriptions of the system, dataflow descrio- 
tions, complete description of all user input, system out- 
put, and input-output files, operational procedures, func- 
tional and procedural descriptions of each module and 
complete description of all internal interfaces between 
modules. 

The Coding and Unit Test activity involves the 
translation of the detailed design into a source program 
in some appropriate programming language (usually 
FORTRAN). Each programmer will unit test each 
module for apparent correctness. 

The System Integration and Test activity validates 
that the completed system produced by the coding and 
umt test activity meets its specifications. Each module, 
as it is completed, is integrated into the growing system 
and integration test is performed to make sure that the 
entire package executes as expected. Functional testing 
of end-to-end system capabilities is performed according 
to the system test plan developed as part of the require- 
ments activity. 

In the Acceptance Test activity, the development 
team provides assistance to the acceptance test team, 
which checks that the system meets its requirements. 

Operation and Maintenance activities begin after 
acceptance testing when the system becomes operational. 
For flight dynamics software at NASA, these activities 
are not significant to the overall coat. Most software pro- 
duced is highly reliable. In addition, the flight dynamics 
software is usually not mission critical in that a failure of 
the software does not mean spacecraft failure but simply 
that the program has to be rerun. In addition, many of 
these programs (i.e., spacecraft) have limited lifetimes of 
six months to about three years. 

Table 1 presents the raw data on the fourteen pro- 
jects analyzed in this paper. The thirteen numbered pro- 
jects are all fairly large flight dynamics programs, rang- 
ing in size from 15,500 lines of FORTRAN code to 80,513 
tines of FORTRAN, with an average size of 57,800 lines 
of FORTRAN per system. The average work on these 
projects was 80.0 staff months; thus, all represent 
significant effort. The last project listed in Table 1 - 
FDAS - represents a prototype development and will be 
discussed in more detail later. 

In most organizations, phase data are collected 
weekly so that they are the usual reporting mechanism. 
However, in the SEL, activity data are also collected. 
The data that are collected consist of nine possible activi- 
ties for each component (i.e., source program module) 
worked on for that week. In this paper, these will be 
grouped as Design activities, Coding activities (code 
preparation and unit testing), Integration testing, Accep- 
tance testing and Other. Specific review meetings, such 
as design reviews, will be grouped with their appropriate 
activity (e.g., a design review is a design activity, a code 
walkthrough is a coding activity, etc.). This allows us to 
look at both phase and activity utilization. 
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PROJECT SIZE ANP.j 

;TAFF-<7»^TH E3 

FFORT 

PROJECT 

SIZE (LINES 

TOTA- L* FORT 

STAFF- 

NUMBER 

OF CODE) 

HOUR* 

MONTHS 

1 

15,500 

17,715 

116.5 

2 

50,011 

12,588 

82.8 

2 

61,178 

17,039 

1124 

4 

26,844 

10,946 

72.0 

ft 

25,731 

1,514 

10.0 

6 

67,325 

19,475 

128.4 

7 

66,260 

17,997 

118.4 

1 

+ 

+ 

+ 

9 

55,237 

15,262 

100.4 

10 

75,420 

5,792 

38 1 

11 

89,513 

15,122 

99.5 

12 

75,393 

14,506 

954 

13 

85,369 

14,309 

94.1 

Average 

57,890 

13,522 

890 

FDAS 

33,967 

14450 

93 1 


+ - Raw data not available io data baM 

• - Ail technical effort including programmer and management time 
Table 1. Project Si*e and Staff- month Effort 

The results of this can be briefly summarized by 
Table 2. According to this, in NASA, 22% of a project’s 
effort is during the design phase, while 46% is during 
coding. Integration testing takes 16% while all other 
activities take 12%. (Remember that requirements data 
are not being collected here. We are simply reporting the 
percentage of design, coding, and testing activities. A 
significant requirements activity does occur.) 


1 . . 

EHE9I 

1329 

Int. Test. 

Other ' 



■a 

mi 



n 

IKS 




Table 2. Activities performed in each phase (by %) 


However, actual activities differ somewhat from sim- 
ply looking at effort spent between somewhat arbitrary 
calendar dates set up months in advance. By looking at 
all design effort across ail phases of the projects, design 
activity is actually 25% of the total effort rather than the 
22% listed above. Coding Is a more reasonable 30% 
which means that the coding phase includes many other 
activities. “Other" increased from 12% to 29%, and 
include many time-consuming tasks that are not 
accounted for by the usual life cycle. (Here, Other 
includes acceptance testing, as well as activities that take 
a significant effort but are usually not separately 
identifiable using the standard model. These activities 
include meetings, training, travel, documentation, and 
other various activities assigned to the project.) 

The situation is actually more complex than shown 
in Table 2. Although using Phase Date shows that total 
design effort differs by only 3% from the design phase 
effort, the distribution of design activity throughout the 
project is not reflected in the table. These data are 
presented in Table 3. 




Int. Test 

Accept. Test 

1 

20 

20 

2 


Table 3. Design Activity During Life Cycle Phases (by %) 

As Table 3 shows, only 50% of all design work 
occurs during the Design Phase and just under one third 
of the total design activity occurs during the coding 
period. Over one fifth (20%+2%) of all design occurs 
during testing when the system is “supposed" to be 
finished. 

As to coding effort. Table 4 shows that while a 
major part, or 70% of the coding effort, does occur dur- 
ing the Coding Phase, almost one quarter (16%+7%) 
occurs during the testing periods. As “expected, only a 
small amount of coding (7%) occurs during the design 
phase; however, it does indicate that some coding does 
begin on parts of the system while other parts are still 
under design. - 


Design 

Code 

Int. Test 

Accept. Test 

7 

70 

18 

7 


Table 4. Coding Activity during Life Cycle Phases (by %) 

Similarly, Table 5 shows that significant integration 
testing activities (about 34%) occur before the integra- 
tion testing period. Once modules have been unit tested, 
programmers begin to piece them together to build larger 
subsystems. 


lESHS 

Code 

Int. Test 

Accept. Test 

1 2J 

34 

63 

3 


Table 5. Integration Activity during Life Cycle Phases 
3. Prototyping 

As can be seen, programmers readily flow from one 
activity of a project to another-more like a series of 
rapids and not as a discrete. set of waterfalls. Any model 
that does not reflect this cannot hope to accurately por- 
tray software development. Boehm has proposed a spiral 
model (BOE 86] of software development which takes 
some of this into account. In addition, the concept of 
prototyping has been proposed as an alternative concept. 
The remainder of this paper will address the prototyping 
issue. - : r r 

The current model of software development is 
becoming even more complex. As new techniques are 
developed, how do they fit into the life cycle? For exam- 
ple, pseudocode is often written to describe a design. 
This pseudocode is often Iterated in greater detail to 
evolve into the source program. However, when does 
pseudocode stop being design and when does it become a 
source program? Prototyping is another technique which 
doesn’t fit into this mode! well. In a prototype, the 
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developer builds some operational aspect of the system 
and then evaluates the prototype with respect to some 
criteria. Where does this coding and testing fit? What 
activity is this in the overall life cycle? 

At NASA, a prototype was developed to investigate 
implementation strategies for a new product. In this sec- 
tion, the role of the prototype will be described and the 
resulting data collected from building the prototype will 
be compared with the historical life cycle data presented 
In the preceding section. 

A prototype Flight Dynamics Analysis System 
(FDAS) was implemented by NASA/GSFC. Data were 
collected during the development of the system. For typ- 
ical flight dynamics software, which NASA has consider- 
able experience in building, prototyping would be of lim- 
ited benefit due to significant knowledge of how previous 
systems were built. However, in this case, FDAS was to 
be a source code maintenance system to manage other 
source code libraries. It would enable NASA analysts to 
test new spacecraft orbit models by providing a human- 
engineered common interface which could be used to 
invoke other flight dynamics packages. Since it was 
unlike previous NASA projects, and since NASA person- 
nel had limited knowledge of exactly how to build this 
system, FDAS was a good candidate for prototyping. 

The goal of the prototype was to understand the 
problem domain better. As such, an early decision was 
made to build the system with every expectation of 
throwing it away. If part of the source program could be 
transferred to the final system, then that would be 
viewed as an unexpected bonus. After the prototype was 
built, it would be evaluated and from this experience the 
requirements for a production version of FDAS would be 
developed. Therefore, the basic idea of the prototype 
was to learn, and it fits into the life cycle as part of the 
requirements phase of Figure 2. 

This definition of prototyping differs from others 
that view & prototype as a first release of a system. The 
goal was clearly to be able to understand the problem 
and not to generate useable source programs. In another 
study [BOE 84], prototyping was viewed as an Iterative 
process converging on the final product. 

We viewed the prototype as part of the requirements 
analysis of the problem. However, since the prototype 
was to execute, it itself had a full development life cycle. 
As Table I previously showed, since FDAS was almost 
34, OCX) lines of code and took about 03 staff months to 
complete, it was a rather large project by itself. 

FDAS was to be an interactive system. That meant 
that the user interface was crucial. Because of this, it was 
determined that the prototype should emphasize that 
aspect of system design. 

The prototype was built in FORTRAN for a DEC 
VAX 11/780. In hindsight it is not clear that such an 
Implementation was the wisest. However, at the start, 
the problem did not seem that complex, and personnel 
experience and available hardware and software lent 



Figure 2. Prototype as part of Software Life Cycle 

themselves to a FORTRAN Implementation. Since the 
goal was to give the user a taste of what services the 
system would provide, a screen simulation applications 
package (e.g., Rapid/Use [WAS 86]), a very high level 
simulation, or a 4th generation language might have been 
adequate. 

The use of FORTRAN, however, did have some 
benefits. For one, it gave the developers experience in 
using FORTRAN in a type of text- processing application 
for which they had little previous experience. One of the 
reasons that the NASA group generally has high produc- 
tivity is that they have had considerable experience in its 
application area. By building the prototype in FOR- 
TRAN, they were using Brooks’ second system property 
where he advises “plan to throw one away” [BRO 75]. By 
building a first prototype in FORTRAN, mistakes would 
undoubtedly be made. By planning on discarding the 
prototype rather than patching it to correct errors, the 
ultimate FDAS system should be more reliable and better 
structured — even if it did not turn out to be cheaper. 
This by itself is a valuable property, although it is not 
clear that it is a measurable one on most projects. 
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A more important aspect of a FORTRAN imple- 
mentation (at least with respect to this paper) is that the 
FDAS prototype was a “typical" FORTRAN project. 
Hence its life cycle characteristics and the data that were 
collected could be compared with many other projects in 
the NASA database. This would not have been possible if 
some other mechanism (e,g., simulation package of some 
sort) were used. 

In the next section, the prototype will be evaluated. 
However, here are some of our general conclusions. The 
handling of requirements differed from a production sys- 
tem; FDAS requirements were incomplete when design 
began. Unlike previous projects, they were not stated 
precisely because aspects of the system were still an open 
subject during development [ZEL 84]; even identifying 
the potential user community and its impact on the user 
interface and its effect on “assumed computer experience" 
was still being considered. Dates for completion of each 
phase were more flexible than in the historical data and 
milestones were less rigid than in a production develop- 
ment. During other phases, requirements were generally 
modifiable which in turn affected all activities in each 
phase. 

More time was spent in design, than is usual for a 
typical project. Unlike other NASA projects, an exten- 
sive review process took place almost weekly as design 
decisions were made and altered. The coding and testing 
efforts had no formal review. Although status meetings 
were held almost weekly, the developers placed less 
emphasis on testing than with a production system; and 
since the prototype had a very limited lifetime, features 
that seemed well understood but cumbersome to imple- 
ment were deleted from the requirements. According to 
the final report, coding took less time than in previous 
projects but testing did consume the same amount of 
effort. Very little effort was spent on acceptance testing, 
since the effective life of the prototype was short. 


Evaluation of Prototype 

rr/ man ° er similar to the 13 other NASA project 
1 e DAS project was analyzed by phases and activity 
data in the SEl database. 


4*1. Phase Analysis 

Dm* collection based on phases is shown in Table 
2'™, *xp«nded for design, coding, and testing we 
I ,c- lr f *• ^ ut n °tice that acceptance testing was on 
da-.' Pr ° t0type b « 12.7% in the historic 

f.Mure i ? ' m ' ted lifetime, reliability was a iimite 
lLi "V ** lon 6 “ the system worked for evaluation, 
m j-f addltion ’ integration testing took 10‘ 

tV. believe ,1- ° compared t0 W%) >n the prototypi 

the du * *» * sch «dui« slippage” i 

delayed until the Ld* Pr0t ° type C ‘ u5 * d ‘ ctiviti «* *° b 


development effort by phase date 


[13 Projects 

v* Prototype FDAS) 


PROJECT 

DESIGN 

CODE 

INTEC. 

ACCTST 

NUMBER 

m 

(%) 

ACT.(%) 

{%) 

1 

20.6 

38 6 

16 5 

24 3 

2 

16.2 

48.4 

19 3 

16 2 

3 

21.8 

47 9 

17.4 

12 9 

4 

35 9 

395 

24 5 

O.i 

5 

18 2 

68.8 

13.0 

00 

6 

163 

48 6 

10 9 

24 3 

7 

19.0 

50 4 

14 9 

157 

8 

22.9 

48.4 

13.0 

15.8 

9 

22 6 

68.3 

8.1 

11 

10 

24 4 

44.6 

20.2 

10 8 

11 

22.7 

39 4 

21 4 

*16.5 

12 

16.9 

53.1 

109 

19 l 

13 

28.2 

43 5 

20.1 

8 2 

Averace 

22.0 

49.2 

16.2 

12.7 

FDAS 

27 0 

45.3 

26.4 

1.3 


Table 6. Software Development Effort by Phase 


4.2. Activity Analysis 

In the previous subsection, we viewed effort by 
phase date. Table 7 displays the actual activities of 
design, coding and integration lest effort independent of 
phase. In this case the results differ. Usually during the 
design phase, coding and testing activities begins on some 
modules, and in the code and unit test phase, additional 
design activity continues. Integration testing begins as 
soon as coding and unit testing of a component com- 
pletes. Similarly, during the testing phase, any errors 
that were uncovered might require substantial redesign 
and recoding. Comparing with Table 6. we discover that 
most NASA developments have additional design effort 
later in the life cycle to raise total design effort from 22% 
toJ25.6%. J|n the FDAS case, total design dropped from 
27% to 25%, meaning that activities' oilier than design 
occurred in the design phase. In both cases, activities 
other than coding occur during the coding phase since 
actual coding activity was only 30.5% and 17.6% 
respectively, as opposed to the 45+% of effort of the cod- 
ing phase (Table 6). 

Comparing FDAS with the 13 other developments, 
design effort is comparable at 25%, but the code and unit 
test effort and the integration test effort were different. 
Due to the wide variability of the “other" category of 
Table 7, Table 8 presents the same data as relative per- 
cent for Design, Code, and Integration testing only. This 
shows the differences more clearly. 

No formal review was performed on the prototype 
during coding and unit testing. Because of the decision 
to delete hard-tobuild but understood features that did 
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DEVELOPMENT EFFORT BY ACTIVITY IN ALL PHASES 

(13 Protects Prototype FDAS) 


PROJECT 

NUM 

l i Y > 1 

DESIGN 
ACT (95) 

CODE 
ACT 1 %) 

1NTEG 
ACT (95) 

OTHER 

ACT (95) 

i 

17 4 

10 4 

99 

503 

2 

301 

39 4 

208 

97 

3 

203 

203 

19 3 

34 2 

4 

273 

287 

80 

380 

S 

31 0 

355 

9 4 

24 1 

6 

149 

21 8 

240 

392 

7 

202 

259 

143 

390 

8 

11 0 

13 9 

93 

058 

9 

31 3 

435 

18 9 

04 

to 

382 

373 

0 1 

184 

11 

293 

31 0 

17 2 

22 5 

12 

237 

405 

24 0 

59 

13 

320 

363 

15 0 

15 0 

Average 

25 0 

305. 

15 0 

28 9 

FDAS 

25 0 

17 0 

25 1 

32 3 


Table 7 Software Development Effort by Actiniy 

not effect the FDAS evaluation, coding was quite 
straightforward. Moat of the easy coding was completed 
in & rather short time, and the more difficult coding 
aspects were simply not implemented. As Table 8 indi- 
cates, at 26% coding, FDAS had the lowest relative cod- 
ing effort of any of the 14 measured projects. The next 
lowest was 30.8% and the average over ail 13 was 42.2%. 
In addition, while in most projects the design and 
integration testing efforts were less than the coding 
activity, in FDAS both were almost 50% greater than for 
coding (about 37% for each compared to 26% for cod- 
ing). 


PER CENT EFFORT IN EACH PHASE 

_ (13 Protects vs Prototype FDAS) 


PROJECT 

DESIGN 

CODE&UNIT 

INTEG. 

NUM 

ACT{%) 

ACT(%) 

ACT(%) 

1 

39.9 

37.5 

22.6 

2 

33 3 

43.7 

23.0 

3 

39 9 

30.8 

293 

4 

44.0 

46 3 

09.7 

5 

40.8 

46.8 

12.3 

6 

24.0 

359 

395 

7 

335 

42.8 

236 

8 

32.2 

40.7 

27.1 

10 

46.8 

45.7 

07.5 

11 

37.8 

40.1 

22 1 

13 

25.2 

49.4 

25 5 

13 

38 0 

43 0 

18.4 

Average 

36.2 

42.2 

21 6 

FDAS 

30 9 

26 0 

37 1 


Based on the original productivity rate of 1.4 source 
lines of code (SLOC) per hour on most NASA projects 
[BAS 81), FDAS with a size of 33,967 SLOC had a 
productivity rate of 2.4 SLOC per hour. (Note: the aver- 
age project size of 57,890 SLOC of Table 1 cannot simply 
be divided by the average effort of 13,552 hours since 
most NASA projects reuse some code from previous syv 
terns. Table 1 is total system size, and the productivity 
rate is for new lines of code.) 

4.2.1. Design Effort 

A true picture of development can be achieved by 
investigating actual activity during each phase. 
Although design Is supposed to occur principally during 
the design phase, for both the 13 older projects and the 
FDAS prototype a comparable one half of the total 
design effort occurred during the design phase, and equal 
amounts were distributed through the rest of the life 
cycle (Table 9). This repeats Table 3 in more detail. 
Only 2% of the design of FDAS occurred during the 
acceptance test phase in the prototype, principally 
because the FDAS acceptance testing phase was so short 
and the few errors that were found did not get 
redesigned and corrected. For the historical data, the 
6.4% of design occurring during acceptance testing 
represents errors found in testing that required source 
code to be redesigned. 


DESI 

GN ACTIVITY EFFORT IN EACH PHASE 

113 Projects VS Prototvn* 

PROJECT 

NUM 

DESIGN 
PHASE! 95) 

CODE 

PHASER) 

INTEG. 

TESW5) 

ACC.TST. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
U 
12 
13 

Average 

418 

53.6 

33.3 

45.3 

17.4 

58.9 

63.9 
28 1 
61.8 
57.8 

58.7 
589 

60.5 

49.2 

33 9 

31.2 

37.1 

32.6 

69.1 

30.7 

15.3 
56.9 
382 
27 2 

13.7 

32.8 
24.7 

34.1 

10.0 

9.2 
19.7 
220 
13.5 

4.3 
6.8 
7.1 
0.0 
70 

16 67 
5.9 
11 9 

10 3 

14 3 
60 
99 
0.1 
00 
5 2 
14.1 
8 0 
0.0 
80 

10.9 
2.4 

2.9 

6 4 

FDAS 

49.8 1 j 

19.6 

1.7 


T*blt 0. Design Activity Effort 


Table 8. Relative Activity 

This apparent short circuiting of coding, however, 
appeared to have a detrimental effect on testing, which 
took a relative 37.1% of effort as opposed to 21.6% on 
other projects. Only one other project (6) took as much 
effort (39%) and from Table I project 6 was the most 
costly, where you might expect an excessive need for test- 
ing. 


Loae 


* '-'hi* isnort 


. Th * COde * un,t tesl &ctivit! ” in the prototype, 
It hi Ver ,'n\ nP i re3eDt 4 de P arture from the older projects 
(Table 10). In most developments, about 7 % of the cod- 

0% d f gn (although U varied from 

often h * ° ther proJeets )- Implementation 

often begiw as some components become completely 
specified. However, with FDAS, due to its greater uncer- 
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tainty, no coding occurred until the development team 
really understood the design* i.e. t until the coding phase 
began* For most projects* 70% of the total code and 
unit test effort is in the coding phase, but in the proto- 
type almost 96% of the effort was during coding. Coding 
often extends through acceptance testing, but with 
FDAS’s relatively light acceptance test, few critical errors 
were found so little effort was spent in recoding during 
test. Coding and testing need to be carried out on the 
full system for every change or modification of the 
design, but in the prototype it was not necessary to code 
the new design. 


CODE tt TEST ACTIVITY EFFORT IN EACH PHASE 


(13 Prefects vs PrototYoe FDAS} 


PROJECT 

DESIGN 

CODE 

IN TEG. 

ACCTST. 

NUM 

PHASE{%) 

PHASER) 

TEST(%) 

PHASER) 

1 


78.3 

11.3 

9.1 

2 


72.8 

19.7 

7.5 

3 


562 

11.8 

9.8 

4 

16 4 

585 

25.1 

0.1 

5 

21.2 

88.7 

10.1 

0.0 

6 

0.5 

77.3 

11 3 

10.9 

7 

1.3 

73.9 

156 

9.2 

8 

14.7 

54.7 

21.0 

9.7 

9 

5.2 

91.1 

3.1 

06 

10 

0.0 

730 

22.5 

4 5 

11 

2.2 

70.5 

20 1 

7.2 

12 

0.3 

74.8 

8 3 

16 6 

13 

4.6 

63.6 

26 9 

■mm 

Avert** 

6.9 

703 

15 9 

mm 

FDAS 

00 

95 9 

4.1 

liHEXHIHE 


Table 10. Code Jt Unit Te*t Activity Effort 


4.2.3, Integration Teat Effort 


INTECS 

LATION ACTIVITY EFFORT IN EACH PHASE 

— f 13 Project* vs PrototvDe FTiac:\ 

PROJECT 

NUM 

DESIGN 

PHASER 

BBSS 

INTEG. 


< 

1 

■9 

wn 

17.8 

45.2 

53.9 

39.3 

71.0 

40.9 

54.1 
338 

66.4 

23.1 
364 
327 

49.5 

434 


! FDAS | 

00 1 

34.5 

■n 

2.8 1 


Table 11. Integrating Teat Activity Effort 


of the task. The acc eptance test activity is low for the 
similar reason tha t the p rototype system had few users of 
short duration and therefore no detailed tests. On the 13 
collected projects, the Other activities are distributed 
more uniformly during all phases, including the accep- 
tance test where there is a need to test before actually 
turning the system to the user. 


OTHER ACTIVITIES EFFORT IN EACH PHASE 

^113 PUCCIS V* PrnUyp. nVlC) 


PROJECT 

NUM 


DESIGN 

PHASER) 


CODEATST 

PHASER] 


IN TEG. 
TEST(95) 


ACCTST 

PHASER) 


Integration test effort is distributed through all 
phases in the collected projects with more effort (43%) 
during the code & unit phase than in either the integrar 
Pk*** (28%) or the acceptance test phase (28%) 
( a le 11). In general, almost 50% of all integration test- 
oc ^ urs during design and coding phases. In FDAS, 
nts effort was delayed with about two-thirds of all 
integration activities in the integration phase. This was 
ue to delaying the integration until more pieces of the 
system were completed. 


4.2.4. Other Activity Effort 

The Other category consists of activities such s 
rave , completion of the data collection forms, meeting! 
or training. While these activities are often Ignored ii 
roost ife cycle studies, the costs are significant. Typi 
tk i** b ° Ut 2 °^ of actW itie3 are in this category and o 
** }* measured projects, “other" consumed more that 
ne-third of the effort on 8 of them (Table 7). FDA! 
th * d * C0 ®P arable 32% ‘other”. As seen in Table 12 
e prototype devoted more effort to the design phase 

evtln 3 - 0r I f eet ’ ngl traveling, and training due to thi 
extensive unknown quality of the design at the beginnin* 


1 

2 

3 

4 

5 
0 
7 
9 
9 

10 

11 

12 

13 

Av<rm 

FDAS 


23.3 
0.0 

21.7 

48.2 
11.0 

18.2 

14.4 

26.5 
15.9 

12.4 

21.4 
47.3 

42.5 

23 1 
45 1 


32.2 
9.1 

47 8 

30.2 
67.7 

44.2 

51.6 

47.7 

65.5 

30.2 

32.2 

46.6 
300 


18.1 

26.4 
16.8 

23.6 

21.3 
9.0 

14.5 

11.4 

18.7 
35 9 
18 9 

4.6 

12.7 


26 5 

64.6 

13.7 
00 
0.0 

28.7 
19 5 
14.4 

0.0 
21 5 
27.6 
1.5 
14.9 


41 2 
38.8 


17 8 
15.7 


17.9 
0 3 


Table 12. Other Aetivftie* Effort 


o. concilia ton A 

In this paper we have collected data on many 
software projects developed at NASA/GSFC and com- 
pared them with a new prototype development. By using 

Miwfr 3 "} th ? SEL d * tabase > U & PP'^ clear that the 
software development process does not follow the water- 

fail life cycle. It also appears that the prototype develop- 
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ment follows a similar life cycle pattern as other software 
projects. Although a single data point (the prototype) 
does not give definitive answers, it does give some trends 
that are of interest. 

Both approaches have similar software life cycles, 
but the effort distributed over each phase differs. The 
coding in the prototype was more ad hoc, therefore test- 
ing became more involved. Integration testing was 
harder in the prototype because of the false assumption 
that reliability was not a central issue. The production 
developments devote more effort in coding than in testing 
(Table 7). 

While not inexpensive, the prototype appears to be 
successful. Several design decisions turned out to be 
partially faulty when the prototype was tested. The 
human computer interface has been redesigned. 

In fact, after completion of the prototype, several 
screen simulation systems were used to model a user 
interface, and a more hierarchical menu model was 
developed. Without the FDAS experience, NASA might 
have implemented a system where users had no real 
experience until the large Implementation would be too 
far along to change adequately. 

The underlying execution model of FDAS became 
better understood. As a source code control system, the 
separation of the FDAS code and the user's flight dynam- 
ics application code became clearer. Most user programs 
would be FORTRAN (at least initially); however, other 
languages (e.g., Pascal, Ada) would be used in the future, 
while it would not matter to the user In what language 
FDAS was itself written. 

FDAS included a prototype preprocessor to add 
abstract data types to FORTRAN. This preprocessor 
was initially tied directly to the FDAS implementation. 
It is now somewhat independent to allow for other 
preprocessors later. The FORTRAN preprocessor, call 
OPAL, for Object Programming Applications Language 
(CSC 86], is a more rational extension of FORTRAN 
with data structures useful for flight dynamics applies 
tions, such as vectors, matrices, and quaternions. The 
decision was also made to move away from FORTRAN, 
and the system itself is being implemented in Ada, 
although it will initially process FORTRAN application 
code. 

A new production FDAS implementation would 
avoid many potential pitfalls discovered via the proto- 
type. Currently the production version of FDAS is under 
development, and its design has benefited greatly from 
the earlier development. We will have to wait for com- 
pletion before fully evaluating this process. It is quite 
clear, however, that FDAS will be a much better product 
that if the prototype had not been built. 

Prototyping probably increases the cost of the sys- 
tem, but it greatly increases its quality. It gives a flavor 
to the end user of what the system can do and how it 
can perform the task, especially in a nonfamiliar environ- 
ment. It provides the developers a “second system” effect 
for perfecting a design. 
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Measuring Software Design Complexity 


D. N. Card and W. W. Agresti 

Computer Sciences Corporation , Silver Spring, Maryland 


Architectural design complexity derives from two 
sources: structural (or intermodule) complexity and local 
(or intramodule) complexity. These complexity attributes 
can be defined in terms of functions of the number of I/O 
variables and fanout of the modules comprising the 
design. A complexity indicator based on these mea- 
sures showed good agreement with a subjective assess- 
ment of design quality but even better agreement with 
an objective measure of software error rate. Although 
based on a study of only eight medium-scale scientific 
projects, the data strongly support the value of the 
proposed complexity measure in this context. Further- 
more, graphic representations of the software designs 
demonstrate structural differences coresponding to the 
results of the numerical complexity analysis. The pro- 
posed complexity indicator seems likely to be a useful 
tool for evaluating design quality before committing the 
design to code. 

1. INTRODUCTION 

Typically, design is the earliest stage of software 
development at which the pending software system is 
fully specified and in which the system structure is 
clearly defined. Design usually proceeds in two steps- 
architectural, then detailed design. This study only 
considers the former. Throughout the following discus- 
sion, * ‘design” will refer to architectural design unless 
otherwise indicated. Assessment of the quality of a 
software design rates high in the priorities of software 
developers and managers. However, the multitude of 
potentially conflicting design objectives, methods, and 
representations, as well as a lack of appropriate data, 
have hindered the development of effective measures of 
software design quality. 

One quality attribute, complexity, has been studied 
extensively. Early investigations [1,2] focused on the 
internal organization of individual programs or subpro- 
grams rather than on the structure of software systems 
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composed of large numbers of subprograms (or mod- 
ules). More recently, complexity studies have attempted 
to consider software systems [3, 4]. Many of these 
approaches require extensive analysis (usually special 
tools) to compute values of the complexity measures 
proposed. Moreover, few of these measures can be 
computed at design time. The objective of this study was 
to define some “simple” complexity measures that 
could easily be derived during early design. 

The initial investigation considered many existing 
models of software complexity but did not find any of 
them suitable for this application because 1) necessary 
data were difficult to extract or compute, 2) required 
information was not available during architectural de- 
sign, and/or 3) our data data did not support the model. 
For example, ail of these reservations apply to software 
science [1]; see Card and Agresti [28]. 

This paper explains a new approach to measuring 
software design complexity that considers the structure 
of the overall system as well as the complexity incorpo- 
rated in individual components. The measures derive 
from a simple model of the software design process. 
Analysis of data from eight medium-scale scientific 
software projects showed that the complexity measures 
defined in this report provide a good estimate of the 
overall development error rate, as well as agreeing with 
a subjective assessment of design quality. Furthermore, 
differences in design complexity indicated by the com- 
plexity measures also demonstrated themselves in design 
profile graphs. 

This analysis relied on data collected by the Software 
Engineering Laboratory (SEL) from eight spacecraft 
flight dynamics projects. The SEL is a research program 
sponsored by the National Aeronautics and Space 
Administration [5]. It is supported by Computer Sci- 
ences Corporation and the University of Maryland. The 
objectives of the SEL are to measure the process of 
software development in the flight dynamics environ- 
ment at Goddard Space Flight Center, identify technol- 
ogy improvements, and transfer this technology to flight 
dynamics software practitioners. 

185 
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2. NATURE OF DESIGN COMPLEXITY 

Architectural design is the process of partitioning the 
required functionality and data of a software system into 
parts that work together to achieve the full mission of the 
system. Thus, architectural design complexity can be 
viewed as having two components: 1) the complexity 
contained within each part (or module) defined by the 
design, and 2) the complexity of the relationships among 
the parts (modules). In the following discussion, we will 
refer to design parts as modules, in the sense that a 
module is the smallest independently compilable unit of 
code [6J. Each design part will eventually be imple- 
mented as a software module. In the FORTRAN en- 
vironment of the SEL, modules correspond to subrou- 
tines. 

Many different approaches or methods achieve the 
same design result: a high-level architectural design and 
an integrated set of individual module designs. The 
detailed design (e.g., PDL) developed to implement the 
work assigned to a module provides another source of 
complexity that is not analyzed here. It is not the intent 
of this paper to address whether specific design methods 
result in lower-complexity (or better) design products. 
Rather, its objective is to demonstrate a complexity 
measurement approach that can be applied to a wide 
range of such products, regardless of how they were 
produced. The authors recognize that correct design 
practice is essential to achieving good designs. Gener- 
ally, this report shows that the conditions that result in 
lower values of the complexity measures arc consistent 
with accepted design practices. 

Of course, any complete design must include nonmo- 
dules such as files and COMMON blocks (in FOR- 
TRAN). Furthermore, partitioning is not the only design 
process. This proposed model only attempts to capture a 
subset of all the possible factors in complexity. As 
Curtis [7] points out, complexity depends on the 
perspective from which an object or system is viewed. 
This paper examines software complexity with respect to 
the difficulty of producing the designed system (for 
example, the difficulty of changing the implemented 
system is not considered). The following discussion is 
intended to illustrate the line of reasoning followed in 
developing the model and measures. It should not be 
construed as a mathematical proof that this model is a 
necessary and sufficient explanation of complexity. 


2.1 A Design Model 

One common approach to design is functional decompo- 
sition (the basis of structured design [6]). It results in a 
hierarchical network of units (or modules). For any 
module, workload consists of input and output items 


WORKLOAD 
(NPUTOUTPUT DATA) 
I 




CONNECTIONS TO WORK DEFERRED 
(STRUCTURAL COMPLEXITY) 


Figure. I. Decomposition model of software design. 


(data couples) to be processed. At each level of 
decomposition, the designer must decide whether to 
implement the indicated functionality (perform the 
work) in the current module or defer some of it to a 
lower level by invoking one or more other modules (via 
control couples). Deferring functionality decreases the 
local (intramodule) complexity but increases the struc- 
tural (intermodule) complexity (see Fig. 1). Similar 
decisions also must be made when following other 
design approaches (e.g., object oriented [8]), 

The internal design of a module (how the work is 
performed) may contribute procedural complexity, but 
that is outside the scope of this paper. Of course, many 
early studies of software complexity (e.g., [2]) focused 
on process construction. The distinction made here 
between local and procedural complexity parallels the 
distinction between the specification and the body of an 
Ada* package. 

Thus, architectural complexity is a function of the 
work performed (within modules) as well as the connec- 
tions among the work parts (modules). Effective design 
minimizes work as well as connections. This argument 
leads to the following formulation for the total complex- 
ity of a software design; 

C- =5- +L- (1) 

where 

C-~ = total design complexity 

S- - structural (intermodule) complexity 

L - a# local (intramodule) complexity 


* Ada is a registered trademark of the U S. Government (Ada Joint 
Program Office). 
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That is, the total complexity of a design of given 
complexity C~ can be defined as the sum of intermo- 
dule plus intramodule complexity. In this simple model, 
ail complexity resides in one or the other of these two 
components; hence, they are additive. These complexity 
components correspond to the structured design con- 
cepts of module strength* (or cohesion) and coupling 
defined by Stephens et al. [6]. 

2.2 Relative Complexity 

Because projects (and designs) vary greatly in terms of 
magnitude, a measure of relative complexity ultimately 
may prove more useful than total complexity. Dividing 
by the number of modules defined in the design 
normalizes these complexity measures for size so that 
designs of different magnitudes may be compared: 

C = S+L (2) 

where 

C = C~/n (relative design complexity) 

S = S-/n 
L = L~/n 

n = number of modules in system 

Although individual modules may vary greatly in size in 
terms of lines of source code, the module, as it is used 
here, is the unit of design. Hence it is the appropriate 
normalization factor. The rest of this discussion will 
concern relative complexity. 

3. DEFINITION OF COMPLEXITY MEASURES 

The next sections define measures for each of the two 
components of relative complexity just identified in 
Equation 2, The measures incorporate counts in the 
design characteristics (calls, variables, and modules) 
identified in the model. (Table 1 summarizes some 
design measures from the modules studied in this 
analysis). The following sections also discuss methods 
and consequences of minimizing complexity as defined 
by this model. 


Table 1. Design Measures Summary 



Minimum 

Mean 

Maximum 

Module size 

1 

66 

603 

Fanin 

1 

1.3 

16 

Fanout 

0 

2.8 

27 

I/O variables 

1 

24 

237 

Level 

2 

. 6.1 

II 


Note: Based on 1.142 newly developed modules. 


2 


3.1 Structural Complexity 

Structural complexity derives from the relationships 
among the modules of a system. The most basic 
relationship is that a module may call or be called by 
another module. The structurally simplest system con- 
sists of a single module. For more complex systems, 
structural complexity is the sum of the contributions of 
the component modules to structural complexity. These 
potential contributions are occurrences of fanin and 
fanout as noted by Henry and Kafura [9], as well as by 
Belady and Evangelisti [3]. (Fanin is the count of calls to 
a given module. Fanout is the count of calls from a given 
module.) 

In the SEL data analyzed (see Table 1), multiple fanin 
generally confined itself to modules that were simple 
mathematical functions reused throughout the system. 
Consequently, fanin did not prove to be an important 
complexity discriminator. On the other hand, fanout 
proved to be highly sensitive, as indicated in a previous 
study [10]. Counting fanout only also ensures that each 
connection is counted exactly once. Note that lower 
fanout indicates less coupling in the sense that there are 
fewer couples (without regard to their strength [11] or 
type [6]). 

According to this model, a module with a fanout of 
zero contributes nothing to structural complexity. How- 
ever, the distribution of fanout within a system also 
affects complexity. The interconnection matrix repre- 
sentation of partitioning used by Belady and Evangelisti 
[3] suggests that complexity increases as the square of 
connections (fanout). AH descendents of a given module 
are connected to each other by their common parent. 
Then, for a fixed total fanout, a system in which 
invocations are concentrated in a few modules is more 
complex than one in which invocations are more evenly 
distributed. These considerations lead to the following 
formulation for structural complexity: 



where 


S = structural (intermodule) complexity 

/, = fanout of module 4 T* 

n = number of modules in system 

This quantity is the average squared deviation of actual 
fanout from the simplest structure (zero fanout). Henry 
and Kafura's term “(fanin * fanout) ** 2“ [9] reduces to 
fanout-squared when fanin is assumed equal to one (the 
nominal case). Similarly Belady and Evangelisti ’$ mea- 
sure of complexity [3] is a function of the number of 
nodes (modules) and edges (fanout) in a system or 
cluster (partition). 

13 
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The fanout count defined here does not include calls to 
system or standard utility routines, but does include calls 
to modules reused from other application programs. A 
reused module must be examined by the designer to 
determine its appropriateness— as opposed to standard 
utilities that are well understood by developers. 

V 

3.2 Local Complexity 

The internal complexity of a module is a function of 
the amount of work it must perform. The workload 
consists of data items that are input to or output from 
higher or parallel modules. This definition is consistent 
with Halstead’s concept [1] of the minimal representa- 
tion of a program as a function (single operator) with an 
associated set of I/O variables (operands). This work- 
load measure parallels the idea of actual data bindings as 
used by Hutchens and Basil i [11]. 

Then, to the extent that functionality (work) is 
deferred to lower levels, the internal complexity of a 
module is reduced. Averaging the internal complexities 
of a systems f s component modules produces its local 
complexity. Most guidelines for decomposition suggest 
decomposing into units of equal functionality. Assum- 
ing, for simplicity, that the workload of a module is 
evenly divided among itself and subordinate modules 
leads to the following formulation of complexity: 



where 

L =» local (intramodule) complexity 
Vj = I/O variables in module “i” 
ft » fanout of module “i” 
n = number of new modules in system 

The " + 1” term represents the subject module’s share 
of the workload (incidentally, it prevents the divide-by- 
zero condition from arising when a module has no 
fanout). I/O variables include distinct arguments in the 
calling sequence (an array counts as one variable) as 
well as referenced COMMON variables. An earlier 
study [10] indicates that the presence of unreferenced 
COMMON variables does not affect module quality. 
Data item complexity is not considered here (only newly 
developed modules enter into this computation). 

Henry and Kafiira [9] used the count of source lines of 
code to represent intramodule complexity. However, as 
used in Henry and Kafura [9], no matter how large the 
module, its complexity would be zero if it had no fanout. 
Basili et al. [12] showed source lines of code (size) to be 
highly correlated (r = 0.79) with the number of I/O 


variables (operands). Another earlier study [13] shows 
that high-strength modules [6] tend to be small. Conse- 
quently, the local complexity measure may be an 
indicator of average module strength (or cohesion [6]). 

3.3 Minimizing Complexity 

Design complexity, as defined in the preceding sections, 
can be minimized by minimizing its structural and local 
components. However, these components are not inde- 
pendent. Both measures include fanout. Minimizing 
structural complexity requires ~ nunimizing the fanout 
from each module. For a given number of both modules 
and total fanout, structural complexity is mim nized 
when fanout is evenly distributed across all modules 
(except terminal nodes, of course). On the other hand, 
local complexity can be minimized by maximizing 
fanout or minimizing variable repetition. 

Repetition occurs whenever a data item appears In 
more than one module as a calling sequence argument or 
referenced common variable. Internal uses (including 
CALLs to other modules) do not count as repetition. In 
general, minimizing local complexity will produce 
smaller modules (in terms of executable statements), but 
is also may increase structural complexity disproportion- 
ately. For a given module with a fixed number of I/O 
variables, thelanout ffiat contributes minimum complex- 
ity can be determined as follows: 

c «/* + „/</+ 1) 

where 

c = contribution of given module to total complexity 
per Equations 2, 3, and 4 

then 

dc/df=lf- v/(f+l) 2 
at minimum 

0 = 2/- i//(/+l) J 

then 

* = 2/(/+l) 2 (5) 

Figure 2 shows a plot of Equation 5 as a step function (to 
reflect the discrete natures of v and /). It identifies the 
fanout that minimizes complexity for possible counts of 
I/O variables. For example, in the range from about 100 
to 200 I/O variables, complexity is minimized with a 
fanout of 3. Since very few modules include as many as 
200 I/O variables, the plot indicates that the commonly 
accepted range of values for fanout (up to 7 ± 2) is 
much too large. Curtis [7] suggests that the popularity of 
this bound derives from a misunderstanding of certain 
psychological studies. This implication is consistent with 
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an earlier study [10]. Furthermore, Constantine [6] 
observes that most programs can be decomposed effec- 
tively into a common structure of three parts: input, 
process, and output. Larger fanouts may indicate too 
rapid decomposition. This result suggests than a fanout 
of one is a reasonable value for modules with few I/O 
variables. 

In addition to the selection of an appropriate fanout, 
design complexity can also be minimized by reducing 
variable repetition, i.e., by not including variables 
where they are not needed. Rigorous application of the 
principle of information hiding [14] should reduce 
variable repetition and, hence, local complexity. 

Figure 3 shows two design segments of equal struc- 
tural complexity: The number and distribution of fanouts 
are identical. Each data couple represents a repetition of 
the variable ‘X’. Figure 3a traces this variable through 
a design following strict topdown decomposition rules. 

appears in the higher level modules (A, B, D) 
as well as in the lower level modules (C, E). Figure 3b 
shows an alternative design with a horizontal transfer of 
data that bypasses the higher level modules (for the case 
in which modules A, B, and D do not actually use *T'). 
The local complexity of the intermediate modules (B, D) 
in the strict top-down configuration (Figure 3a) exceeds 


Figure 2. Selecting fanout to minimize complexity. 


their counterparts in the alternative design (Figure 3b) 
because their counts of I/O variables are larger. 

Parameter transfer between hierarchically adjacent 
modules (e.g., from B to A) produces a lower complex- 
ity than transfer via a global area when that is as far as 
the data item goes. For a triplet connection (e.g., from B 
to A to D), the two approaches have the same complex- 
ity (“X” counts twice in each). This implication is 
consistent with the results of an earlier study [10]. 
Because this model emphasizes the number of data 
couples rather than the nature of the coupling mecha- 
nism, it penalizes “tramp data” (data passed through but 
not referenced by a module). 

Rotenstreich and Howden [15] argue that both hori- 
zontal and vertical data flow are essential to good 
design. Appropriate use of horizontal transfers prevents 
data flows from violating levels of abstraction. COM- 
MON blocks provide the only mechanism for horizontal 
data transfers in FORTRAN. Figure 3 shows that 
horizontal flows can reduce the magnitude of the local 
complexity measure in some situations. 

Of course, a less complex design might also be 
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fa) STRICT TOP-DOWN STRUCTURED DESIGN 



fb) LOWER COMPLEXITY WITH LATERAL TRANSFER 


Figure 3. Reducing variable repetition to minimize complex- 
ity. 


produced by partitioning the work differently and 
restructuring this desfgn. For example, PROC C could 
be invoked directly by PROC E (if the nature of the 
problem permitted). This simpler structure would also 
be reflected in Tower values of the complexity measures 
defined by this model. (PROCs B and C would each 


have fanout of one instead of PROC B having fanout of 
two. Thus, structural complexity diminishes.) 

4. EVALUATION OF COMPLEXITY MEASURES 

The value of the complexity measures defined in the 
preceding sections was evaluated in two ways. First the 
complexity scores for the eight projects were compared 
with a subjective rating of design quality using a 
nonparametric statistical technique. Then the complexity 
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scores were compared with objective measures of 
development productivity and error rate. This section 
presents the results of the two evaluation approaches. 
Productivity and error rates were computed using the 
developed lines of code (DLOC) measure as defined by 
Basili and Freburger [16]. 

Data for this analysis were extracted from the source 
code of eight projects by a specially developed analysis 
tool. However, software developers can easily extract at 
design time the counts of modules, fanout, and I/O 
variables necessary to compute these complexity mea- 
sures. The eight projects studied were ground-based 
flight dynamics systems for spacecraft in near-earth 
orbit. Table 2 summarizes some general characteristics 
of these software systems. The most recent project 
studied was completed in 1981. 

All of these systems were designed and implemented 
to run under the Graphics Executive Support System 
(GESS), an interactive graphics interface [17]. Conse- 
quently, GESS occupies Level 1 of each design hierar- 
chy. GESS manages most external data interfaces for 
these systems. It is not included in the complexity 
calculations. 

4.1 Subjective Quality 

The eight projects were subjectively ranked in order 
from best to worst, in terms of design quality, by a 
senior manager who participated in the development of 
all eight projects. Then, the four best-rated designs were 
classified as “good’ while the other four were classified 
as “poor.” Table 3 shows the results of that procedure. 
The table also includes the computed complexity mea- 
sures. Note that the four designs subjectively rated as 
“good” also demonstrated the lowest relative complex- 
ity. The expert was not provided with specific criteria 
for “quality,” but later reported that perceived “com- 
plexity” played a major role in assigning scores. 


Table 2. Project Characteristics 


Project 

Total 

Modules 

Percent 

Reused - 

Size 

(KDLOC*) 

Error 

Rate' 

Productivity 4 

A 

158 

11 

50 

8.7 

3.5 

B 

203 

34 

49 

8.0 

2.9 

C 

338 

32 

106 

4.5 

4.7 

. D 

259 

84 

37 

4.0 

4.7 

E 

327 

24 

83 

4.5 

4.8 

F 

393 

47 

79 

7.1 

4.1 

G 

199 

49 

57 

7.2 

2.3 

H 

245 

43 

56 

6.6 

2.4 


* Percent of local modules. 

* Thousand* of developed lines of code. 
' Errors per KDLOC. 

* Developed lines of code per hour. 
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Table 3. Design Complexity and Quality 


Project 

Complexity 

Design 

Rating* 

Quality 

Class 

S 

L 

C - 

A 

24.6 

8.2 

32.8 

5 

Poor 

B 

15.8 

9.5 

25.3 

2 

Good 

C 

11.8 

12.1 

23.9 

3 

Good 

D 

18.4 

4.9 

23.3 

1 

Good 

E 

12.6 

10.0 

22.6 

4 

Good 

F 

22.3 

7.3 

29.6 

6 

Poor 

G 

18.3 

10.3 

29.1 

8 

Poor 

H 

19.2 

7.3 

26.5 

7 

Poor 


* C * S + L as previously defined (Equaiton 2). 
‘Subjective evaluation (1 * best, 8 » worst). 


Although the correspondence between subjective de- 
sign rating and numerical design complexity is not one- 
for-one, if the data are viewed as quality classes, they 
provide persuasive evidence for a relationship. (If one 
uses the Wilcoxon rank sum statistic the probability is 
less than 0.02 that the observed good/poor grouping 
could occur by chance alone.) The objective complexity 
measure appears to capture much of the information that 
a human observer includes in a subjective evaluation of 
design quality. 

4.2 Performance Prediction 

The other test of the value of these complexity measures 
is their ability to predict software development perform- 
ance in terms of the productivity and error rate ulti- 
mately realized by the development team. A more 
complex design will be more difficult to develop into an 
acceptable system. However, let us first define a few 
relevant quantities: 

Developed lines of code— all newly developed source 
lines of code plus 20% of reused source lines of code 
[16]. 

Errors— conceptual mistakes in design or implementa- 
tion. An error may result in one or more faults (code 
changes). These were detected during integration and 
system testing (after unit testing). 

Effort— hours of work by programmers, managers, and 
support personnel directly attributable to a project. 

Productivity— developed lines of code divided by effort 
(in hours). 

Error rate— total errors divided by developed lines of 
code. 

The developed lines of code metric attempts to account 
for the lower cost and error rate attributable to reused 
code. Table 2 shows the developed lines of code, 
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REQUIREMENTS 



Figure 4. Source of software errors (from Weiss 
and Basili [19)). Note: Excludes clerical/tran- 
scription errors. 


productivity, and error rate for the eight projects. Note 
that together these projects represent more than, 1,000 
individual new modules produced by about 50 different 
programmers. 

Designers and researchers commonly assume that 
higher complexity increases the propensity for error, 
Potier et al, [18] observe that the implementation 
process consists largely of translating design specifica- 
tions into a programming language. It usually does not 
add complexity to a system. Weiss and Basili [19] show 
that the bulk (74-82%) of all nonclerical errors reported 
in three of these projects were related to design, 
although sometimes at very detailed levels. Figure 4 
shows the median distribution of errors for the project? 
studied by Weiss and Basili [19]. Very few of these 
errors are true programming errors. Of course, many 
detailed design and implementation errors are detected 
during code reading and unit testing (not counted here). 
In this context, clerical/transcrlption errors can be 
regarded as random. 

Figure 5 illustrates the relationship between design 
complexity and error rate. It shows that design complex- 
ity effectively predicts the total error rate for develop- 
ment projects. Complexity (as measured here) accounts 
for fully 60% of the variation m error rate. As seen in 
Figure 5, all but one of the points lie very close to the 
regression line. In that case, Project B, the implementa- 
tion team consisted of an unusually large proportion of 
junior personnel (although its design team was compara- 
ble to those of the other projects). Consequently, it 


seems reasonable to find a higher error rate than would 
be indicated by design complexity alone. 

Figure 6 illustrates the relationship between design 
complexity and productivity. No clear relationship 
emerges. However, as noted elsewhere [20], many 
important factors external to the development process 
(such as computer use and programmer expertise)" 
strongly affect productivity. In this case (consistent with 
[20]), computer-hours-per-thousand-developed-Iines of 
code correlates strongly with the residuals from the 
Figure 6 relationship ( r = -0.79). Computer support 
was only provided to these projects for detailed design, 
coding, and testing, so it does measure a different set of 
activities. However, the small sample size (at the project 
level) inhibits evaluation of a more complex model 
incorporating both complexity and computer use. 

In this organization, the design team forms the 
nucleus of the implementation and test teams. Additional 
personnel join as they are needed. Thus, the complexity 
measure provides an early indication of the performance 
of the development team as well as of the quality of the 
design. A good design team is likely to be a good 
implementation and testing team, although productivity 
may be difficult to predict. 

5. REPRESENTATION OF DESIGN STRUCTURE 

The numerical quantities defining these complexity 
measures are the number of modules, fanout, and I/O 
variables. Table 4 shows the distribution of these 
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measures by hierarchical level for one project. This 
design structure can be represented graphically, as 
shown in Figure 7, by plotting the cumulative percent- 
age of these quantities obtained at each level. Kafura and 
Henry [21] employed a similar technique to show the 
effect of design changes on complexity. 

In this and subsequent plots, the design structure (or 
profile) is simplified by combining all utility modules, 
regardless of where they are invoked, into a single 
deepest level of the design. That point is not plotted 
(utility refers to new or reused modules that are invoked 
from several different points within a design but not to 
system or standard utilities). Levels greater than or equal 
to 10 also are combined into a single level to facilitate 
plotting. 

As discussed earlier, the conditions that minimize 
structural complexity result in an even distribution of 
fanout. This produces an increasing growth rate in the 
cumulative percentage of total fanout in the initial levels 
of the design, followed by a gradual decrease in growth 
rate as subtrees terminate. The percentage of modules is 
driven by the fanout at the preceding level (minus calls 
to utilities). Uneven use of utilities causes the module 
line to fail to track fanout. Equation 5 showed that I/O 


Figure 5. Relationship to error rate. 

variables should be proportional to fanout in order to 
minimize local complexity. Together, these conditions 
define the shape of a good (low relative complexity) 
design. 

Figure 7 illustrates Project E, the design with the 
lowest relative complexity. It shows three closely fitted 
“S'* shaped curves. Figure 8 illustrates Project A, the 
design with the highest relative complexity. It shows 
three separate and irregular lines. Profiles of the other 
six projects fall in-between these two extremes in 
correspondence to their measured complexity. 

6. CONCLUSIONS 

The complexity measures proposed in this report are 
supported by substantial empirical evidence. The struc- 
tural complexity component is similar to measures used 
successfully by Belady and Evangelisti [3] and Henry 
and Kafura [9] for other languages and application areas. 
However, neither of these models, as originally formu- 
lated, fit the SEL data very well. The new model 
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Figure 6. Relationship to productivity. 

demonstrated good agreement with subjective assess- 
ments of design quality as well as a numerical measure 
of error rate. Moreover, all relevant measures can be 
extracted at design time; the Henry and Kafura model 
includes a code measure. 


Table 4. Detailed Design Structure for Project E 


Level 

Modules 


Module Average 

Executable 

statements 

Fanout 

Input/ output 
variables 

2 

2 

91 

6.5 

45 

3 

4 

37 

4.8 

9 

4 

19 

59 

5.6 

29 

5 

93 

67 

2.2 

26 

6 

62 

59 

2.0 

24 

7 

54 

59 

1.8 

20 

8 

33 

37 

1.4 

14 

9 

7 

19 

0.7 

8 

210 

2 

8 

0.0 

5 

Utility 

* 51 

90 

2.4 

21 


Many software development methods, e.g., [22], 
encourage trying design alternatives. Because software 
developers can easily compute values for these complex- 
ity measures at design time, they seem likely to be use ful 
for assessing design quality and comparing design 
alternatives before committing a design to code. Overall 
high-compiexity des igns , as well as individual high- 
complexity modules, can be identified. These measures 
could be adapted to support a measures-guided method- 
ology such as that, proposed by Ramamoorthy et al. [23]. 

Of course, complexity is not the only important 
attribute of software designs. The minimum complexity 
that can be achieved depends on the nature of the 
application and the presence of design constraints. 
Furthermore, design is not a deterministic process. The 
same design approach or method applied by different 
individuals can result in different designs. These com- 
plexity measures help us to answer the question, 
“Which is better?* * However, it is not enough to 
produce a design that shows low complexity scores. 
Following a sensible and well-defined design method 
ensures that the design problem is responded to while 
minimizing complexity. Measures play a supporting role 
in the design process. 
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LEVEL 


As Kearney et al. pointed out [24], ill-founded 
reliance on complexity measures can degrade the soft- 
ware development process by rewarding poor program- 
ming practices. The approach to complexity measure- 
ment presented here satisfies the requirements of 
Kearney et al. [24] for effective complexity measures by 
clearly identifying the attributes measured, deriving 
them from a model of the design process, suggesting 
how they can be used in practice, and empirically testing 
their validity. Nevertheless, more work remains to be 
done. 

Three aspects of this current complexity measurement 
approach require additional research. First, methods of 
incorporating external I/O (e.g., files) into the complex- 
ity measures must be developed. In the systems studied, 
much of the external I/O is handled by the GESS stand- 
ard interface. Second, the application of the measures 


Figure 7. Design profile of Project E (lowest complexity). 


should be extended to designs using different formalisms 
intended for different implementation languages. “Mod- 
ules” corresponding to FORTRAN subroutines are not a 
universal design structure. The SEL has begun to study 
the application of these measures to Ada design [25]. 
Third, the existence of two design complexity compo- 
nents suggests that two different types and distributions 
of the design errors (in addition to programming errors) 
also exist, as proposed by Basili and Perricone [26]. 
That needs to be verified empirically. 

Finally, Kafura and Reddy [27] showed that similar 
complexity measures appeared to related to software 
maintainability. This suggests another new area of 
investigation. 


5207 


2-21 


196 


D. N. Card and W. W. Agresti 



Figure 8. Design profile of Project A (highest complexity). 
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Abstract 

In this paper we discuss a study aiming at the improve- 
ment of measurement and evaluation procedures used in an 
industrial maintenance environment. We used a general 
evaluation and improvement methodology for deriving a set 
of metrics tailored to the maintenance problems in this par- 
ticular environment. Some of the required maintenance data 
were already collected in this environment, others were sug- 
gested to be collected in the future. We discuss the general 
measurement, evaluation and improvement methodology 
used, the specific maintenance improvement goals important 
to this environment, the set of metrics derived for quantify- 
ing those goals, the suggested changes to the current data 
collection procedures, "and preliminary analysis results based 
on a limited set of already available data. It is encouraging 
that based on this limited set of data we are already able to 
demonstrate benefits of the proposed quantitative approach 
to maintenance. Finally, we outline ideas for automating the 
discussed approach by a set of measurement and evaluation 
tools. This paper emphasizes the steps of introducing such a 
quantitative maintenance approach into an industrial setting 
rather than the , environment-specific analysis results. The 
analysis results are Intended to demonstrate the practical 
applicability and feasibility of the proposed methodology for 
evaluating and improving maintenance aspects in an indus- 
trial environment. 

L Introduction 

In this paper we present results from a study trying to intro- 
duce sound measurement and evaluation procedures into an 
industrial maintenance environment. The goal of the study 
has been to investigate the company’s needs for quality 
assessment, and the suitability of the error, change, and 
effort data already collected in this environment for address- 
ing these quality assessment needs. 

First we describe the actual industrial maintenance 
environment which has been the object of this study includ- 
ing the high-level maintenance assessment and improvement 
goals as stated by high-level management (section 2) and the 
goal/question/metric paradigm 1, * 7 used in this study for 
defining and quantifying the maintenance assessment and 


TkU *tudy «u «up ported by % (rant from Burroughs Corporation to the Uaivw- 
sity of Marylaod. Computer time wia provided ia part through facilities of the 
Computer Science Center of the University of Marylsad. 


improvement goals of interest. The application of this 
methodology has resulted in a list of clearly defined mainte- 
nance assessment and improvement’ goals and quantifiable 
questions (section 4) as well as the corresponding data and 
metrics (section 5). Until now only a subset of these data 
and metrics required to fully address the stated maintenance 
goals had been collected (section 6J. Based on the needs of 
the particular industrial environment changes to the data 
collection and validation process have been suggested for the 
future (section 7). Preliminary analysis results for a small 
subset of the questions and goals of interest (depending on 
the type, amount and quality of data available at the time) 
are presented (section 8). It is encouraging that based on 
this limited subset of data we are already able to demon- 
strate benefits of this quantitative approach to maintenance. 
Finally, we outline ideas for automating the proposed 
approach by a set of measurement and evaluation tools (sec- 
tion 9). This paper emphasizes the steps of introducing such 
a quantitative maintenance approach into an industrial set- 
ting rather than the environment-specific analysis results. 
The analysis results are only included to demonstrate that 
the proposed approach actually works in this particular 
environment. 


M aintenance Environment 

The study was conducted in the maintenance environment of 
a major computer company. The maintenance process from 
an organizational point of view can be characterized as fol- 
lows: Customer Support receives maintenance problems 
(mainly) from customers, evaluates them and, whenever 
appropriate forwards them in the form of change requests to 
Product Assurance. Product Assurance evaluates the 
change requests again and forwards them, whenever 
appropriate, to Engineering. The eventually changed pro- 
ducts are sent back to the customers) through the same 
channels (Product Assurance, Customer Support). 

Data are currently being collected during all these 
different maintenance steps. Customer Support collects data 
for each single problem concerning scheduling (e g., time of 
incoming calls, time of outgoing calls), type of problem (e.g., 
clarification of documentation, operation request; for a com- 
plete list see table 2), priorities of problems, and effort spent 
on handling the problem. Product Assurance collects data 
for each single change request concerning scheduling, type of 
change request, effort spent, and final status (e.g., changed, 
change postponed, change rejected including the reason for 
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rejection). Engineering collects data for each change con- 
cerning scheduling, change effort, and the type of change 
performed. Data collection is mandatory in some groups such 
as Product Assurance; it is done on a voluntary basis in 
other groups such as Engineering Based on this fact the 
completeness and validity of collected data varies across the 
entire maintenance environment. In general it is true that 
Customer Support and Product Assurance stress data collec- 
tion more than Engineering does. 

Although this is a very simplified description of the 
maintenance process it should allow the reader to under- 
stand the different needs of Ihese three different maintenance 
roles as far as assessment needs are concerned. 

The data were used for filing status reports concerning 
the handling of maintenance requests but not (except locally 
in some groups) for overall quality assessment. The purpose 
of this study was to find out whether the already collected 
data are sufficient for assessing the environment specific 
maintenance problems and, if not, to suggest changes of this 
data collection process. 

The most urgent maintenance assessment and improve- 
ment goals were formulated by corporate representatives of 
the company as follows: 

Gl: Examine where the bulk of the company’s maintenance 
dollars are being spent and how much is being spent on 
individual activities. 

G2: Identify the best ways of applying the 20/80 rule* to get 
the biggest savings and return on our maintenance dollars. 
G3: Identify criteria for when a product is ready for release. 
G4: Identify features of product, documentation or support 
that provide a wider customer satisfaction. 

G5: Identify criteria for when a software product should be 
rewritten rather than maintained. 

G6: Identify metrics of customer satisfaction that can be 
developed based upon existing data. 

G7: Develop organizational guidelines for integrating 

software quality metrics into the company’s framework of 
design, development, and support. 

It is obvious that these high-level and complex prob- 
lems can only be assessed by breaking them down into more 
and more simple problems. This refinement process, which 
finally is expected to result in a set of quantitative metrics, is 
supported by a methodology developed by the authors* i,T . 


3-» The Goal/ Qu e stion /Metric Paradigm 

The approach to quantification of goals is the 
goal/question/metric paradigm 1, K ** 7 . This paradigm does 
not provide a specific set of goals but rather a framework for 
defining goals and refining them into specific quantifiable 
questions about the software process and product that pro- 
vide a specification for the data needed to help answering 
the goals. 

The paradigm provides a mechanism for tracing the goals of 
the collection process, i.e. the reasons the data are being col- 
lected, to the actual data. It is important to make clear, at 
least in general terms, the organization’s needs and concerns, 


■ Applying th* 30/80 rul« m«ui to id«ntir> than* raainUnaoee problems which cm 
b« fixed easily {with twenty percent ot the effort of what would be required to lx 
■II maintenance problems) but reduce the maintenance overhead drastically (by 
eighty percent). 


the focus of the current project and what is expected from it. 
The formulation of these expectations can go a long way 
towards focusing the work on the project and evaluating 
whether the project has met those expectations. The need 
for information must be quantified whenever possible and 
the quantification analyzed as to whether or not it satisfies 
the needs. This quantification of the goals should then be 
mapped into a set of data that can be collected on the pro- 
duct and the process. The data should then be validated 
with respect to how accurate it is and then analyzed and the 
results interpreted with respect to the goals. 

The actual goal/question/metric paradigm is visualized in 
figure 1. 



Figure X: Goal/Question/Metric Paradigm. 


Here there are n goals shown and each goal generates a set of 
questions that attempt to define and quantify the specific 
goal which is at the root of its goal tree. The goal is only as 
well defined as the questions that it generates. Each ques- 
tion generates a set of metrics (mj) or distributions of data 
(d_i). Again, the questions can only be answered relative to 
and as completely as the available metrics and distributions 
allow. As is shown in figure 1, the same questions can be 
used to define different goals (e.g. Question_6) and metrics 
and distributions can be used to answer more than one ques- 
tion (e.g. m_l and m_2). Thus questions and metrics are 
used in several contexts. 

Given the above paradigm, the process of quantifying 
improvement goals consists of three steps: 

(1) Generate a set of goals based upon the needs of 
the organization. 

The first step of the process is to determine what it is you 
want to improve. This focuses the work to be done and 
allows a framework for determining whether or not you 
have accomplished what you set out to do. Sample goals 
might consist of such issues as on how to improve the set 
of methods and tools to be used in a project with respect 
to high quality products, customer satisfaction, produc- 
tivity, usability, or that the product contains the needed 
functionality. 

(2) Derive a set of questions of interest or hypotheses 
which quantify those goals. 

The goals must now be formalized by making them 
quantifiable. This is the most difficult step in the process 
because it often requires the interpretation of fuzzy terms 
like quality or productivity within the context of the 
development environment. These questions define the 
goals of step 1. The aim is to satisfy the intuitive notion 
of the goal as completely and consistently as possible. 
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(3) Develop a Mi of data metric* and distribution* 
which provide the information needed to answer 
the questions of interest. 

In this step, the actual data needed to answer the ques- 
tions are identified i*nd associated with each of the ques- 
tions. However, the identification of the data categories is 
not always so easy. Sometimes new metrics or data distri- 
butions must be defined. Other times data items can be 
defined to answer only patf-of a question. In this case, the 
answer to the question must be qualified and interpreted 
in the context of the missing information. As the data 
items are identified, thought should be given to bow valid 
the data item will be with respect to accuracy and how 
weli it captures the specific question. 

Ia writing down goals and questions, we must begin by 
stating the purpose of the improvement process. This pur- 
pose will be in the form of a set of overall goals but they 
should follow a particular format. The format should cover 
the purpose of the process, the perspective, and any impor- 
tant information about the environment. The format (in 
terms of a. generic template) might look like: 

• Purpose of Study: 

To (characterize, analyze, evaluate, predict, motivate) the 
(process, product, model, metric) in order to (understand, 
assess, manage, engineer, learn, improve) it. 

• Perspective of Study: * 

Examine thje (coat, effectiveness, correctness, errors, 
changes, product metrics, process metrics, reliability, user 
satisfaction, etc.) from the point of view of the (developer, 
manager, customer, corporate perspective, etc). 

• Environment of Study: 

The environment consists of the following: process factors, 
people factors, problem factors, methods, tools, con- 
straints, etc. 

• Process Questions: 

For each process under study, there are several subgoab 
that need to be addressed. These include the quality of 
use (characterize the process quantitatively and assess h ow 
well the process is performed, the domain of use (charac- 
terize the object of the process and evaluate the knowledge 
of object by the performers of the process), effort of use 
(characterize the effort to perform each of the subactivities 
of the activity being performed), effect of use (characterize 
the output of the process and the evaluate the quality of 
that output), and feedback from use (characterize the 
major problems with the application of the process so that 
it can be improved). 

Other subgoals involve the interaction of this process with 
the other processes and the schedule (from the viewpoint 
of validation of the process model). •. • 

• Product Questions 

For each product under study there are several subgoals 
that need to be addressed. These include the definition of 
the product (characterize the product quantitatively) and 
the perspective of the evaluation (e.g. reliability or user 
satisfaction). The definition of the product includes physi- 
cal attributes ( e.g. source lines, number of units, execut- 
able lines, control and data complexity, programming lan- 


guage features, time space), cost (e.g. effort, time, phase, 
activity, program), changes (e.g. errors, faults, failures and 
modifications by various classes), and the context the pf5- 
duct is supposed to be used in (e.g. customer community, 
operational profile). The perspective of the evaluation is 
relative to a particular quality (e.g. reliability or user 
satisfaction). Thus the physical characteristics need to be 
analyzed relative to this quality aspect. 


4r M ain t enance Goals and Questions 

We applied the methodology described in section 3 to specify 
the high-level quality assessment and improvement goals 
given to us from a corporate perspective (see section 2) more 
precisely, and to derive quantifiable analysis questions. 
Using the template of section 3 proved to be very helpful 
The entire process of specifying goals and deriving the 
evaluation questions was done in very close cooperation with 
company representatives from Customer Support, Product 
Assurance, and Engineering. 

Tbs seven goals for this study are formulated in 
terms of the purpose of this study, the perspective of 
this study, and' important information about the 
company’s maintenance environment: 

• PURPOSE OF STUDY: Characterize (in the case of goals Gl and G4) 
and evaluate (G2 ( G3, and G5) the maintenance methodology and 
motivate (G6 and G7) the use of metrics for the purpose of better 
understanding (Gl and G4), management (G2, G3, GS, G6, and G7) and 
improvement (G2, G3, GS, G6, and G7) 

• PERSPECTIVE Examine the cost (in the case of goals Gl, G2, G5, and 
G7) r problems (G2), errors and changes (Gl and G5), product and pro- 
cess metrics (G3, G4. GS, and G6) and the effectiveness (G7) from the 
point of view of the manager and corporation 

• ENVIRONMENT 

- Maintenance Process: The customer reports problems (by phone) to 

the Customer Support, if problems cannot be resolved by Customer 
Support they art forwarded to Product Assurance Product assurance 
decides whether the reported problem should be fixed If approved as 
a problem to be fixed it is submitted to engineering (to be fixed), gets 
back to Product Assurance (for fix certification), and is sent back to 
Customer Support ’ ' “ " “ 

- Maintained Products (for which we had access to data). A retrieval 

system (called SYS_1 in the following of this paper) 
and a compiler (called SYS_2 in the following of this 
paper) 

For each process and product under study, there are 
several subgoals (quality of use, domain of use, effort 
of use, effect of use, and feedback of use); each 
subgoal will be addressed by a number of analysis 
questions (Qi): 

(A) PROCESS RELATED QUESTIONS: 

• QUALITY OF USE (characterise the company's maintenance 
process and how well It is performed jr 

Ql What percent of the problems are handled by Customer Support 
without forwarding them to Product Assurance 7 What is a distri- 
bution of their disposition 7 
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Q2: What percent of change requests forwarded to Product Assurance 
do aot com# from the field? Whet it * distribution by percent of 
where they come from (engineering, field test, etc) end the ree- 
sons they do not come from field 7 Whet percent of problems 
aren't reelly maintenance problems? 

Q3: For change requests rejected by Product Assurance or Engineering: 
What are the distributions by 

1} closure code, 

2) organisation rwpoosible for rejection, and 

3) schedule by closure code by organisation 7 

Q4 What are characteristics of the test plan performed by engineering 
before release 7 How effective is this test plan? 

More detailed Is the test suite based upon the new or changed 
final requirements? Are regression tests performed? Are the tests 
based upon the importance and complexity of the requirements? 
What criteria exist for the selection of test cases and test data? 

Q5 What are test cases and test data for the beta test? To what extent 
does it consider the future usage profile? How effective is this 
test? 

Q* For each fix: How long after the fix is made is it released to the cus- 
tomer? 

Q7: What is the distribution of faults or customer problems per organi- 
sational unit in total and by various products? 

Q8: What is tbs distribution of faults due to previous changes per 
organisational unit in total and by various products? 

Q9 What are the distributions of change requests by various subclasses 
(fault/modification, rejected/not rejected, enor subclasses, 

- change subclasses)? 


DOMAIN OF USE (characterise the objects of the maintenance 
process and the knowledge of the people Involved In this 
maintenance pr6ceae)i 

Q10 'What products are available to 

- customer support personnel, 

- problem evaluator, 

• changer, 

• change evaluator, and 

• the field support? 

Qll: What is the knowledge of the people involved wrt 

1) the application, 

2) the particular product, and 

3) the change methodology 7 


EFFORT OF USE (characterise the effort to perform each 
maintenance activity )i 

Q12: What is the cost of 

- detecting a problem symptom 

- understanding the problem, 

- isolating the problem causes, 

• designing the change, 

- implementing the change, 

• testing the change, and 

- releasing the change 

in terms of computer time, people tune, by person category and 
machine category? 

QI3: What is the calendar time for 

- detecting a problem symptom, 

understanding the problem from a customer's viewpoint, 

• understanding the problem from an engineering viewpoint, 


* isolating the problem causes, 

- designing the change, 

- implementing the change, 

- testing the change, and 

- releasing the change? 

(Give the max, mm, average and by various types of changes!) 

• EFFECT OF USE (characterise the output of the maintenance 
process and the quality of this output)} 

Q14 How many and what percent of documents are produced/modified 
as a result of the maintenance process (patch, user manual, addi- 
tional technical documents, closure form, patch release informa- 
tion form, advanced technical information form and user letter)? 

Ql5 How many and what percent of change requests cause a 
modification? 

Q16 How many and what percent of change requests are related to 
errors, environment adaptations, and requirements changes (=» 
enhancements)? 

Ql7: How many and what percent of faults are the result of a previous 
change? 

QI8 What is the average cost of a change overall and by type? 

Q19: Having categorirad changes by function, having made a change in 
a function How many future requests do we get for the same 
function? 

Q20 What are characteristics of customer caJb over time by type of 
question? 

Q21 What customer categories exist? Do clusters of customer profiles 
(types of complaints, faults, etc ) match these categorization 
schemes? 

Q22: Is the user satisfied with function, performance, schedule (by a user 
satisfaction survey)? 


• FEEDBACK FROM USE (characterise the problems with the 
application of the maintenance process so that It can be 
improved)} 


Q23 What are the problem areas in the maintenance process by the fol- 
lowing categories 

- distribution of changes by various types, 

• distribution of problems that are rejected by various types, 

• customer types, and 

- time distribution (calendar time, effort) by various change 
types, problem types, or maintenance activities? 


(B) PRODUCT RELATED QUESTIONS: 


• DEFINITION OF THE PRODUCT (characterise the 
product quantitatively)} 

Q24: What are the physical attributes such as 

- site (source lines, number of units, executable lines of 
code), 

- complexity (control, data), 

- programming language features, 

- time to develop, 

- memory space, and 

• execution frequency? 

■ -4 

Q25 What is the cost, e g , effort (time per phase, activity)? 

Q26: What are distributions of changes, eg, errors, faults, failures, 
adaptations, and enhancements by various types 
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Q27 What is ths products context, eg, customer community, 
operational profile, life cycle model, etc? 

Q28; What are the problem trees in the product by the following 
categories: 

- distribution of changes by venous types, 

- distribution of problems that are rejected by various 

typ«, 

- customer types, and 

- time distribution (calendar time, effort) by various change 
types, problem types, or maintenance activities 7 

Each individual evaluation goal ia quantifiable via a subset of 
these 28 evaluation questions. In table 1 the interrelationship 
is visualized in form of a goal- question matrix. 


5. Maintenance Data & Metric# 

In this section we discuss the types of maintenance data 
which has to be collected in order to answer each of the 
evaluation questions derived in section 4. 

The data (Di) are categorized depending on which 
maintenance aspect (Customer Support, Product Assurance, 
or Engineering) is mainly affected. For each data it is indi- 
cated whether and how it can be retrieved from currently 
maintained data bases, i.e. , whether it is explicitly available 
(+), it is not explicitly available, but can be derived from 
other data with reasonable effort (o), a great deal of effort 
(oo), or it is not available at all (-). 


(1) CUSTOMER SUPPORT ORIENTED MAINTE- 
NANCE DATA: 

For each problem reported by customers (phone calls): 

Dl (+): customer identification 

D2 (oo): customer type 

D3 (+): customer support center identification 
D* (o) problem description 

D5 {+): whether a problem resulted in a change request (Y/N) 

D6 (oo): connection between customer problem and change 

request ^number 

D7 (+): identification of affected system/product 
Off {-): identification of affected product functions 


DU (+): schedules for each activity aasociated with a customer prob- 
lem 


(2) PRODUCT ASSURANCE ORIENTED 
MAINTENANCE DATA: 

For each problem reported by a change request; 

DlO (+): identification of the organi ration that filled out the change 
request (customer support, engineering, field test, etc) 

Dll (+); identification of system/product affected 
D12 (+}: customer identification 
DI3 (-) customer type 

DM (+): identification of Product Assurance center in charge 
D1S (o): concise problem description 

D18 (o): information whether a change request was rejected (Y/N) 
Dl7 (+): final change request status {— closure code) 

DlS (-) information by whom (Product "Assurance, Engineering) clo- 
sure code was set 

Dl9 (+); schedules for each maintenance activity 
D20 (+): information whether it is a fault, adaptation, or enhance- 
ment 


(3) ENGINEERING ORIENTED MAINTENANCE 
DATA: 

For each actually performed change: 

D2l (+): identification of the engineering group in charge 
D22 (-): information about fault types (for example control, data, 
computation, etc) 

D23 (o)- information whether a fault was caused by a previous change 

(Y/N) 

D24 (o): information which product units (modules) were affected by a 
change (in terms of Itnes_ofj:ode or identification of 
modules) 

D25 (- Y effort in computer time in total or per phase, change activity 
D26 (-) effort in people time in total or per phase, change activity 
D27 (+): schedule for each change activity (in calendar days) 

D28 (o) percent of code, documents, forms changed 

D29 (o) product size 

D30 (o): product complexity 

D3i (-): memory space 


The following question-data matrix (see table 2) shows which 
of the 31 different types of data are required as a minimum 
to answer each of the previously listed 28 questions: 
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The questions enclosed in parenthesis have to be answered purely by subjective data. 


The complete refinement process from the origin al goals over 
questions to the data/metrics can be traced by combining 
tables 1 and 2. 


^Availability and Validity ofD^ 

In the previous section it was indicated what data are 
needed for answering the questions of interest. We also 
included the analysis results to which degree those data are 
already available inside the company (+,o,-). 

Interpreting the question/data matrix together with the 
availability and validity of the company’s data the following 
conclusions can be drawn: 

Questions Q6, Ql3, Q15, Q16, Q17, Q20 are completely 
answerable 

- Question. JQ4), (QS), (Q10), (Qll), (Q22) will not be 


answered based on data collected via regular data collec- 
tion forms, but by subjective data from interviews. 

- Questions Q23 and Q28 require no data, they are answered 
by interpreting the results of more basic questions 

* All questions related to change effort (Q12, Ql 8, Q25) can 
not be answered because (at least in the case of SYS_1 and 
SYSJ2) these data were listed as optional on the data col- 
lection form and therefore only listed on about 10% of all 
forms. 

* All other questions are (at least partially) answerable 

7, Improvement of Data Collection 

Based on the company’s interests as documented by the 
high-level problems (see section 2) and the refined set of 
evaluation questions (see section 4), and the partial lack of 
valid data available to analyze those questions, the following 
recommendations for changing the data collection process are 
being made: 
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- A uniform data collection method and data base should be 
defined. . 

Some data items are interpreted differently by different 
people. Each organizational unit inside the the mainte- 
nance environment has its own data base format. This 
fact makes it difficult to assess maintenance problems from 
global views. It is for example difficult to analyze engineer- 
ing data from various sitesj-or the complete life cycle of 
maintenance problems starting at Customer Support 
throughout Product Assurance and Engineering. 

- A maintenance task should be viewed as a single entity in 
this data base, and it should be traceable through all its 
phases (Customer Support, Product Assurance, Engineer- 
ing). Due to the "bottom- up” development of individual 
data bases, each data base contains only those data impor- 
tant to the individual organization. 

The only solution seems to be a central data base that 
contains all information concerning each maintenance task 
starting from the first phone call and ending with its final 
resolution. 

* It is mandatory to collect engineering data (effort in 
staffjiours). 

Engineering data are crucial for determining maintenance 
problems due to product quality problems (e.g., bad struc- 
ture). 

- Development data (errors, changes, tests, etc.) should be 
collected. 

Collection of development data has to start now. As soon 
as the identification of the maintenance problems is com- 
pleted, the impact of product quality and development 
methodology on these problems has to be analyzed. In 
order to do this, data characterizing the development pro- 
cess are needed. 


lL_ Prgllmin»ry Analysis Result^ 

In order to demonstrate the benefits of quantitative assess- 
ment of maintenance we used the data collected at the time 
to answer some of our maintenance questions listed in sec- 
tion 4. We had data available for two commercial systems 
SYS_1 and SYSJ2 (retrieval system and a compiler). We 
had maintenance data available from the first two quarters 
of 19So. 


Id section a w« outlined the questions which could be 
answered based on the data available. In the following ws 
present preliminary analysis results of those questions in tbs 
context of the originally posted high-level corporate mainte- 
nance problems (1) to (5) as listed in section 2 


m inT 7n W the buUt ° r the company’s 
maintenance dollar. »r. being spent and how-mucl. 

is being spent on Individual activities: 


This goal area can be addressed by the following 
questions (see section 4): 


analysis 


• Question 20:(What are characteristics of customer calk 

over time by type of question ?) > Table 3 

The average number of calls per problem is about i. 
The most frequent problems are operation questions 
capability features, and clarification of documentation 
th * c “® of SYS_1) or operation fault (in the case of 
S , T “* C08tly P robIems (” terms of number of 

calls) are documentation faults, system' software, and 
operation faults (in the case of SYS_1), and clarification 
of documentation, capability features, operation ques- 
tiona, and pre-sales requests (in the case of SYSJ2). 

• Question 1 (What percent of problems are not reported 
as change request?? What is a distribution of their 
disposition?) --> Table 4 

Overall only about two percent of all problems recorded 
y Customer Support resulted in change request® (3 out 
of 177 for SYS_1, 3 out of 152 for SYSJ2). 

The disposition of problems not reported as change 
requests in terms of "type of call* is as foUows: 


The bulk of maintenance problems handled by Custo- 
mer Support is spent for "operation requests” and 
"operation faults” in the case of SYSJ; in the case of 
SYS_1 we can identify two additional problem sources: 
problems due to faults of underlying layers (systems 
software and hardware) and problems due to bad docu- 
mentation (almost 20% of all problems !) 
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$YS_ 
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unknown i ype | 

clwify document j 

operation question j 

pre-sties request j 

capability, feature f 

other j 

document fault j 

operation fault j 

application SW change request 
application SW fault { 

system SW fault j 

system SW change request 

instruction fault i 

HW fault 

AVERAGE p 

5 

130 

172 

7 

81 

43 

7 

SO 

4 

34 

14 

7 

97 

2(1.198) 

35 (19.8%) 
49 (28.0%) 
2(1.1%) 
30 (18,9%) 
13 (7.3%) 
1 (0.8%) 
10 (5.8%) 

1 (0.8%) 
IS (8,5%) 
3 (1.7%) 
2(1.1%) 
-11 (9.9581 

2.5 
3.7 

3.7 

3.5 

2.9 
3.3 
7 0 

5.0 

4.0 

5.7 

4.7 
3.5 

3.9 

3.7 

34 

378 

9 

84 

01 

44 

3 

15 

6 

5 

. problems 

7 (4.8%) 
78(51.3%) 
2 (1-3%) 
17(11.2%) 

19 (12.5%) 

20 (13 2%) 
1 (0.7%) 

4 (2.8%) 
2(1.3%) 

— 2_(l_-3%) 

eialii/probkm 

4.9 

48 

4.5 

4.9 

3.2 

2.2 
3.0 

3.8 

3.0 
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ORIGINAL PAGE 1$ 

OF POOR QUALITY 


call- tr be 

SYSJ 1 

L 5YS2 1 


problem* 

call* /problem 

calls 

nroblema 


unknown type 

6 

2(1.1%) 

2.3 

- 

- 

■ - 

clarify document 

130 

33 (12.8%) 

3.7 

34 

7 (4.6%) 

4.0 

operation question 

172 

46 (28.0%) 

3.7 

378 

78 (51.3%) 

4.8 

pre-tales request 

7 

2 (1.1%) 

3.5 

0 

2(1.3%) 

4.5 

capability, feature 

88 

30 (16.8%) 

2.0 

84 

17 (11.2%) 

4.0 

other 

43 

13 (7.3%) 

3.3 

61 

10 (12.5%) 

3 2 

document fault 

7 

1 (0.6%) 

7.0 

- 

- 

* 

operation fault 

30 

10 (5.6%) 

5.0 

44 

20(13.2%) 

2.2 

application SW fault 

4 

1 (0.6%) 

4.0 

- 

* 

‘ 

syatem SW fault 

S3 

15 (8.5%) 

5.7 

15 

4 (2.6%) 

3.8 

iaatructioe fault 

7 

2 (1.1%) 

3.5 

- 

* 

' 

HW fault 

V 

17 (9.6%) 

30 

* . 

211-3%) 

2-5 

TOTAL 

666 

174/177 (88.3 %) 

3.7 

630 

140/162 (08 %) 

4.1 


Tabic 4; Non-forwarded Calb/Prpb lema W CalL-.Txcfi 


• Question 2 (What percent of problems aren’t really 
maintenance problems?) ... > Table 5 


Tabic 6. Portion of Real Maintenance Problem 



SYS_1 

SYS_2 

Number of total problems 
Number of maintenance problems 
percentage 

177 

80 

45.2 % 

152 

116 

76.3 % 


Not all of the problems reported to Customer Support 
are really maintenance problems. There are, for exam- 
ple, lots of requests from different divisions inside the 
company. From a global view, ail the effort spent in 
Customer Support is charged as maintenance effort. In 
the case of SYS_1, only about 45% of ail problems (80 
out of 177), and in the case of SYS_2, only about 76% 
of all problems (116 out of 152) are really maintenance 
problems. 

• Question 3 (What is the distribution of rejected change 
requests by closure code?) > Table 6 

The distribution of rejected change requests by closure 
code is as follows: 


Table 6; Rcicctcd Change Requests by Closure Code 
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Systems | 


BPS 

SYS_2 

need additional information 

11 

11 

not reproducible 

1 

- 

no fix scheduled 

3 

2 

already fixed 

45 

25 

forwarded to ... 

- 

2 

works as intended 

6 

1 

works as documented 

- 

3 

incorrect documentation 

2 

• 

operation problem 

1 

1 

document required 

1 

* 

not retrofit 

2 

8 

other 

- 

2 


• Question 12 (What is the cost of ?) 

Because we have no effort data concerning the Product 
Assurance and engineering aspects of the maintenance 
process, we only could analyze effort as far as Customer 
Support was concerned: 

The cost for each individual maintenance problem (as 
far as Customer Support is concerned) can be character- 
ized 


call-type 

SYS_1 ! 

SYS_2 i 

time (mini) 

problems 

time/problem 

time (mins) 

problems 

time/prob!«m_ 

unknown type 

52 

2 

260 

- 

- 

- 

clarify document 
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35 

22.6 

247 

7 

35.3 

operation question 

1203 

46 

26 I 

3723 

78 

47.7 

pre-sales request 

36 

2 

18 0 

211 

2 

106.5 

capability, feature 

738 

30 

24 6 

747 

17 

44.0 

other 

247 

13 

180 

813 

18 

42. B 

document fault 

43 

1 

43.0 

- 

• 

- 

operation Tault 

303 

10 

30.3 

522 

20 

26.1 

application SW change request 


- 

- 

20 

1 

20.0 

application SW fault 

53 

1 

53.0 

- 

- 

• 

■yttem SW fault 

508 

15 

33.8 

78 

4 

18.4 

eystem SW change request 

167 

3 

55.8 

8 

2 

4.0 

instruction fault 

13 

2 

8.6 

- 

- 

♦ 

HW rault 

327 

17 

19.3 

33 

1 

16.6 _ 

AVERAGE 
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1 SYS_l T 

1 SYS-? 

time (min*) 

problems 

tim4/probiejn_ 

time (mini) 

problem* 

Uma/probltm 

clvify document 

585 

35 

19.0 

305 

7 

43.6 

operation queatioo 

2317 

45 

50.4 

4052 

78 

52.1 

pre-aaJa* rtquwt 

45 

2 

22.5 

130 

2 

55.0 

capability, feature 

1105 

30 

35.8 

855 

17 

50.3 

other 

240 

13 

18.5 

1810 

10 

05 3 

document fault 

117 

1 

117.0 



. 

operation fault 

210 

10 

21.0 

75 

2D 

38 

appt^iXioa SW fault 

330 

1 

330.0 


- 


•jritem SW fault 

1125 

15 

75.0 

708 

4 

1033 

eyaUm SW change request 

115 

3 

38.3 

336 

3 

157.5 

inatruetion fault 

JO 

2 

10.0 

- 


- 

HW fault . 

780 

17 

45.0 

55 

3 

33.6 

AVERAGE 



40.5 (mina) 



55.7 (min*) 


Table 8; QFF-Llnt Spent Effort bv C all-Type 




- by the number of phone calls per problem: 

The average number of calls (interactions with the 
customer) per problem is about 4 (SYS_1: 3.7, SYSJ2: 
4.1) according to table 4. 

The most crucial problems in 5YS_1 in terms of 
number of calls are: documentation faults (7 calls per 
problem), operation faults (5 calls per problem), and 
system software faults (5.7 calls per problem). In the 
case of SYS_2, the most crucial problems are: docu- 
mentation clarifications (4.9 calls per problem), opera- 
tion requests (4.8 calls per problem), pre-sales requests 
(4.5 calls per problem), and capability/feature 
requests (4.9 calls per problem). 

- by the effort spent on-lin* (time spent talking to the 
customer on the phone — > Table 7): 

The average effort. per problem spent on-line is about 
30 minutes. 

In the case of SYS_I, most on-line effort is spent for 
documentation problems (43 minutes per problem), 
application software faults (53 minutes per problem), 
and system software faults (56 minutes per problem). 
In the case of SYS_2 most on-line effort is spent for 
pre-sales requests (105 minutes per problem) 

- by the effort spent off-line (time spent other than talk- 
ing to the customer on the phone — > Table 8): 

The average effort per problem spent off-line is about 
45 minutes. 

In the case of SYS_1, the most off-line effort is spent 
for documentation problems (117 minutes per prob- 
lem) and application software faults (330 minutes). In 
the case of SYSJ2, the most off-line effort is spent for 
system software faults (180 minutes per problem). 

(G2) Identify the beet w eye of applying the 20/80 
rule to get the biggest savings and return on our 
maintenance dollars: 

Although we have qo final results concerning this 
matter, a careful interpretation of the results related 
to goal (Gl) indicates that for instance better docu- 
mentation, in the case of SYS_1, could save a big per- 
centage of maintenance problems. In a paper not 
related to this study an analysis of software mainte- 
nance changes is reported 10 ; Uie authors aim at the 
development of metrics for predicting where those 
changes might occur. Such metrics might help save 
dollars by concentrating resources on subsystems or 


modules which can be expected to require many 
changes. 

(G3) Identify criteria for when a product is ready 
for release: 

This question can only be answered if we know more 
about the type of problems and effort spent in 
engineering before release (question Q4) and about the 
type and problems during field test (question Q5). 

(G4) Identify features of product, documentation or 
support that provide a wider customer satisfac- 
tion: 

This question can be addressed by designing a custo 
mer questionnaire. Some of the technical problems 
definitely have impact on the customer’s satisfaction, 
such as the high number of documentation-related 
problems (in the case of SYS_1) or not being able to 
keep promised dates for calling customers back. 

(G5) Identify criteria for when a software product 
should be rewritten rather than maintained* 

Unfortunately there are no data collected indicating 
explicitly which parts (modules, subsystems) of a pro- 
duct were affected (question Q26) or whether a problem 
is due to a previous change (question Q8). 

The only way to address this question by using the 
currently available data is to evaluate the actual patch 
where the actual lines changed are listed. A paper not 
related to this study indicates that complexity metrics 
characterizing the locality of changes might be a promis- 
ing metric for characterizing the suitability of parts of a 
software system for maintenance purposes 11 . 

(G6) Identify metrics of customer satisfaction that 
can be developed l>ased upon existing data: 

Based upon the results concerning goal G4 we hope to 
be able to develop metrics for customer satisfaction. 
Although it is too early to^xpec^reliable metrics, candi- 
date metrics might include aspects such as ability to 
keep promised schedules for dealing with maintenance 
problems or the frequency of similar (at least from the 
customer’s point of view) maintenance problem reports. 

(G7) Develop organisational guidelines for integrat- 
ing software quality metrics into the company’s 
framework of design, development, and support: 
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This goal represents the second step after having under- 
stood the maintenance problems and identified possible 
improvements. Procedures for monitoring quality and 
productivity have to be established throughout the 
development and maintenance of software products; the 
prescribed data and metrics should be used for manage- 
ment and ' motivation purposes and improved. Before 
this problem can be addressed in a satisfactory way 
many more and different analyses have to be performed; 
in particular, data concerning the development phase of 
products have to be collected in order to identify the 
impact of the particular development process on main- 
tainability. In a paper not related to this study 
interesting approaches for predicting the required custo- 
mer support for a particular system were presented*. 
The prediction approach utilized development metrics 
among others. 


9, Measurement and Evaluation Tools 

In order to apply the proposed quantitative assessment 
approach practically, data collection and validation pro- 
cedures as well as evaluation procedures need to be 
automated. A tool system was proposed integrating many 
tools already available in this environment. The whole tool 
system needs to be implemented in a decentralized fashion 
around a central data base. It has to provide different inter- 
faces to different maintenance groups, limiting each group 
only to data relevant to their specific task, presenting the 
data in a helpful way. Independent of t this company-specific 
project, a research project at the University of Maryland is 
aiming at the development of a comprehensive approach to 
automating measurement and evaluation in the context of 
software projects which include support of the generation of 
goals and questions and the project-specific interpretation of 
measurement results 1 *. 


IQ, Conclusions 

The objective of this study has been to demonstrate the 
benefits of assessing the software maintenance process in a 
quantitative way for the purpose of improvement. We have 
been able to show the applicability of the 

goal/question/metric paradigm to this complex problem 
domain and derive first analysis results based on a very lim- 
ited subset of available data. The long-range benefits can be 
expected to be much more significant provided the derived 
set of data are collected in the future and interpreted within 
the proper context of maintenance questions and goals. In 
this paper we have not addressed the psychological problems 
involved in trying to introduce quantitative approaches into 
a traditional maintenance environment. The interested 
reader is referred to a book describing Hewlett Packard’s 
experience (including psychological problems of motivating 
project personnel and higher-level management) from intro- 
ducing metrics into their daily software production environ- 
ment*. 

It was even surprising to us, how many characteristics 
of the maintenance process could be made visible by analyz- 
ing the limited set of data available at the time. This visibil- 
ity of characteristics might be helpful in communicating 
problems in a more objective and convincing way. 


The analysis result underline the importance of viewing 
software maintenance not as an isolated activity but as 
integrated into the overall software life cycle. We can 
improve the effectiveness of maintenance procedures by 
purely analyzing the maintenance process. However, we will 
never reduce the overall effort (and money) spent for mainte- 
nance below a certain limit if we cannot make sure that 
software products fulfill certain quality requirements at the 
time of delivery (start of maintenance). Low quality products 
will always cause maintenance problems. Accepting this fact 
will lead us to establish quality criteria for a product to be 
released to customers and, thereby, entering the maintenance 
phase. As a consequence, developers could develop guidelines 
for how to achieve those criteria and metrics to evaluate the 
degree to which those criteria are actually met. Altogether 
this would allow us to develop better maintainable products 
in the first place or, at least, allow us to predict certain 
maintenance problems at the beginning of maintenance. 
Additional benefits of collecting maintenance data are to 
provide a better basis for judging customer satisfaction, the 
company's image, and marketing 

If we want to reduce the overall maintenance effort we 
need to apply the assessment and improvement procedures 
introduced in this paper to development as well as mainte- 
nance of a product. This requires the availability of develop- 
ment data (as implicated by the evaluation questions in sec- 
tion 4 ) in addition to maintenance data As' long as we do 
not assess the overall software life cycle, problems will shift 
from design to coding, coding to testing, and development to 
maintenance. It is a well known fact that the really serious 
maintenance problems originate during the prior develop- 
ment of the product, the identification of these real causes of 
maintenance problems will result in significant improvements 
ot maintenance. 
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Resource Utilization during Software Development 
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This paper cSscusses resource utilization over the life cycle of 
software development and discusses the rote that the current 
,l waterfaT model plays in the actual software life cycle. 
Software production in the NASA environment was analyzed to 
measure these differences. The data from 1 3 different projects 
were collected by the Software Engineering Laboratory at 
NASA Goddard Space Fight Center and analyzed for similari- 
ties and differences. The resuits tncficate that the waterfal 
model is not very realistic in practice, and that as technology 
introduces further perturbations to this model with concepts ike 
executable specifications, rapid prototyping, and wide- spec- 
trum languages, we need to modify our model of this process. 


1 . INTRODUCTION 

As technology impacts on the way industry builds 
software, there is increasing interest in understanding 
the software development model and in measuring both 
the process and the product. New workstation technol- 
ogy (e.g., PCs, CASE tools), new languages (e.g., Ada, 
requirements and specification languages, wide-spec- 
trum languages), and techniques (e.g., prototyping, 
object-oriented design, pseudocode) are affecting the 
way software is built, which further affects how man- 
agement needs to address these concerns in controlling 
and monitoring a software development. 

Most commercial software follows a development 
cycle often referred to as the waterfall cycle. While 
there is widespread dissatisfaction with this as a model 
of development, there have been few quantitative studies 
investigating its properties. This paper addresses this 
problem and whether the waterfall chart is an appropri- 
ate vehicle to describe software development. Other 
models, such as the spiral model and value chaining, 
have been described, and techniques like rapid prototyp- 
ing have been proposed that do not fit well with the 
waterfall chart [1,2]. This paper presents data collected 
from 13 large projects developed for NASA Goddard 
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Space Right Center that shed some light on this model of 
development. 

Data about software costs, productivity, reliability, 
modularity, and other factors are collected by the 
Software Engineering Laboratory (SEL), a joint re- 
search project of NASA/GSFC, Computer Sciences 
Corporation, and the University of Maryland, to im- 
prove both the software product and the process for 
building such software [3]. It was established in 1976 to 
investigate the effectiveness of software engineering 
techniques for developing ground support software for 
NASA [4]. 

The software development process at NASA, as well 
as in most commercial development environments, is 
typically product-drive and can be divided into six major 
life-cycle activities, each associated with a specific “end 
product” [5, 6]; requirements, design, code and unit 
test, system integration and testing, acceptance test, and 
operation and maintenance. In order to present consist- 
ent data across a large number of projects, this paper 
focuses on the interval between design and acceptance 
test and involves the actual implementation of the system 
by the developer. 

In this paper, we will use the term ” activity” to refer 
to the work required to complete a specific task. For 
example, coding activity refers to all work performed in 
generating the source code for a project, the design 
activity refers to building the program design, etc. On 
the other hand, the term “phase” will refer to that 
period of time when a certain work product is supposed 
to occur. For example, coding phase will refer to that 
period of time during software development when 
coding activities are supposed to occur. It is closely 
related to management-defined milestone dates between 
the critical design review (CDR) and the code review. 
But during this period other activities may also occur. 
For example, during the coding phase, design activity is 
still happening for some of the later modules that are 
defined for the system and some testing activity is 
already occurring with some of the modules that were 
coded into the source program fairly early in the 
process. 
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In the NASA/GSFC environment that we studied, the 

software life cycle follows this fairly standard set of 

activities [7]: 

1. The requirements activity involves translating the 
functional specification consisting of physical attrib- 
utes of the spaceo-aft to be launched into require- 
ments for a software system that is to be built. A 
functional requirements document is written for this 

system. 

2. A design activity can be divided into two subactivi- 
ties: preliminary design activity and detailed design 
activity. During preliminary design, the major 
subsystems are specified, and input-output interfaces 
and implementation strategies are developed. During 
detailed design , the system architecture is extended 
to the subroutine and procedure level. Data structures 
and formal models of the system are defined. These 
models include procedural descriptions of the sys- 
tem; data flow descriptions; complete description of 
ail user input, system output, input-output files, and 
operational procedures; functional and procedural 
descriptions of each module; and complete descrip- 
tion of all internal interfaces between modules. At 
this time a system test plan is developed that will be 
used later. The design phase typically terminates 
with jhe CDR. 

3. The coding and unit test activity involves the 
translation of the detailed design into a source 
program in some app ropriate programming language 
(us uall y Fortran, although there is some movement to 
Ada). Each programmer will unit test each module 
for apparent correctness. When satisfied, the pro- 
grammer releases t he mo dule to the system libraian 
for configurat ion con trol. 

4. The system integration and test activity validates 
that the completed system produced by the coding 
and unit test activity meets its specifications. Each 
module, as it is completed, in integrated into the 
growing system, and an integration test is performed 
to make sure that the entire package performs as 
expected. Functional testing of end-to-end system 
capabilities is performed according to the system test 
plan developed as part of preliminary design, 

5. In the acceptance test activity, a separate acceptance 
test team develops tests based on functional specifica- 
tions for the system. The development team provides 
assistance to the acceptance test team. 

6. Operation and maintenance activities begin 
after acceptance testing when the system becomes 
operational. For flight dynamics software at 
NASA, these activities are not significant with 
respect to the overall cost. Most software that is 
produced is highly reliable. In addition, the flight 


dynamics software is usually not "mission criti- 
cal,” in that a failure of the software dos not mean 
spacecraft failure but simply that the program has 
to be rerun. In addition, many of these programs 
(i.e., spacecraft) have limited lifetimes of 6 
months to about 3 years, so the software is not 
given the opportunity to age. 

The waterfall model makes the assumptions that ail 
activity of a certain type occurs during the phase of that 
same name and that phases do not overlap. Thus all 
requirements for a project occur during the requirements 
phase; all design activity occurs during the design phase. 
Once a project has a design review and enters the coding 
phase, then all activity is coding. Since many companies 
keep resource data based on hours worked by calendar 
date, this model is very easy to track. However, as 
Figure 1 shows, activities overlap and do not occur in 
separate phases. We will give more data on this laler. 

2. THE WATERFALL CHART IS ALL WET 

Table 1 summarizes the raw data on the 13 projects 
analyzed in this paper. They are all fairly large flight 
dynamics programs ranging in size from 15,500 lines of 
For tran code to 89,513 lines of Fortran, with an average 
size of 57,890 lines. The average work on these projects 
was 8.90 staff-months; thus, all represent significant 
effort. 

In most organizations, weekly time sheets are col- 
lected as part of cost accounting procedures so that phase 
data are the usual reporting mechanism. However, in the 
SEL, weekly activity data are also collected. The data 
consist of nine possible activities for each component 


Table I, Project Size and Staff-Month Effort 


Project 

number 

Size (lines 
of code) 

Total effort 
hours* 

Staff- 

months 

l 

15.500 ’ 

17,715 

116.5 

2 

50,911 

12,588 

82.8 

3 

61,178 

17,039 

112.1 

4 

26,844 

10,946 

72.0 

5 

25,731 

1,514 

10.0 

6 

67,325 

19,475 

128.4 

7 

66,260 

17,997 

118.4 

8 


b 

_t> 

9 

55,237 

15,262 

100.4 

10 

75,420 

5,792 

38.1 

11 

89,513 

15,122 

99.5 

12 

75,393 

14,508 

95.4 

13 

85,369 

14,309 

94.1 

Average 

57,890 

13,522 

89.0 


* All technical effort, including programmer and management time. 
b Raw data not available in data baie. 
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figure 1. Typical life cycle. 
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(e.g., source program module). In this paper, these will 
be grouped as design activities, coding activities (includ- 
ing unit test), integration activities, acceptance testing 
activities and other. Specific meetings, such as design 
reviews, will be grouped with their respective activity 
(e.g., a design review is a design activity, a code 
walkthrough is a coding activity, etc.) 

Table 2 classifies the data presented in this paper. 
Each column represents a type of work product (design, 
code, test). The “by phase” part represents the effort 
during that specific time period, while the “by activity” 
part represents the actual amount of such activity, 
“Other” does not enter into the “by phase” table, since 
these activities occur during all phases. At NASA, 22% 
of a project’s effort occurs during the design phase, 
while 49% is during coding. Integration testing takes 
16% while all acceptance activities take almost 13%. 
(Remember that requirements data are not being col- 
lected here. We are simply reporting the percentage of 
design, coding, and testing activities. A significant 
requirements activity does occur.) 

By looking at all design effort across ail phases of the 
projects, we see that design activity is actually 26% of 


the total effort rather than the 22% listed above. The 
coding activity is a more comparable 30% rather than 
the 49% listed by phase data, which means that the 
coding phase includes many other tasks. “Other” 
increased from 12% to 29% and includes many time- 
consuming tasks that are not accounted for by the usual 
life-cycle accounting mechanism. Here, “other” in- 
cludes acceptance testing as well as activities that take a 
significant effort but are usually not separately identifi- 
able using the standard model. These include corporate 
(not technical) meetings, training, travel, documenta- 
tion, and various other tasks assigned to the personnel. 
The usual model of development does not include an 
“other,” and this is significant since almost one-third of 
a project’s costs are not effective at completing it. More 
on this later. 

The situation is actually more complex, since the 
distribution of activities across the project is not re- 
flected in Table 2. These data are presented in Tables 3- 
5. Only 49% of all design work actually occurs during 
the design phase (Table 3), and one-third of the total 
design activity occurs during the coding period. Over 
one-sixth (10.3% + 6.4%) of all design occurs during 


5207 


2-37 



334 


M, V. Zelkowitz 


Table 2. Development Effort 


Project 

number 

Design 

(%) 

Code 

<*) 

Integration 
act. (%) 

Accept, test 
and other (%) 

By Phase 

1 

20.6 

38.6 

16.5 

24.3 

2 

16.2 

48.4 

19.3 

16.2 

3 

21.8 

47.9 

17.4 

12.9 

4 

35.9 

39.5 

24.5 

0.1 

5 

18.2 

68.8 

13.0 

0.0 

6 

16.3 

48.6 

10.9 

24.3 

7 

19.0 

50.4 

14.9 

15.7 

8 

22.9 

48.4 

13.0 

15.8 

1.1 

9 

22.6 

68.3 

8.1 

10 

24.4 

44.6 

20.2 

10.8 

11 

22.7 

39.4 

21.4 

16.5 

12 

16.9 

53.1 

10.9 

19.1 

13 

28.2 

43.5 

20.1 

8 2 

Average 

22.0 

49.2 

16.2 

12.7 

By Activity 

1 

17.4 

16.4 

9.9 

56.3 

2 

30.1 

39.4 

20.8 

9.7 

3 ‘ 

26.3 

20.3 

19.3 

34.2 

4 

27.3 

28.7 

6.0 

38.0 

5 

31.0 

35.5 

9.4 

24.1 

6 

14.9 

21.8 

24.0 

39.2 

7 

20.2 

25.9 

14.3 

39.6 

8 

11.0 

13.9 

9.3 

65.8 

9 

31.3 

43.5 

18.9 

6.4 

10 

38.2 

37.3 

6.1 

18.4 

11 

29.3 

31.0 

17.2 

22.5 

12 

23.7 

46.5 

24.0 

5.9 

13 

32.6 

36.3 

15,6 

15.6 

Average 

25.6 

30.5 

15.0 

28.9 


testing when the system is “supposed” to be finished. In 
almost one-third of the projects (4 out of 13), about 10% 
or more of the design work occurred during the final 
acceptance testing period. 

As to coding effort, Table 4 shows that while a major 
part (70%) does occur during the coding phase, almost 
one-quarter (16% + 7%) occurs during the testing 
periods. As expected, only a small amount of coding 
(7%) occurs during the design phase; however, the table 
indicates that some coding does being on parts of the 
system while other parts are still under design. These 
data have the widest variability as a range from 0% 
(project 10) to over 22% (project 3). 

Similarly, Table 5 shows that significant integration 
testing activities (almost one-half) occur before the 
integration testing period. Once modules have been unit 
tested, programmers begin to piece them together to 
build larger subsystems, with almost half (43%) of the 
integration activities occurring during the coding phase. 

Due to the wide variability of the “other” category in 
Table 2, Table 6 presents the same data as relative 
percentages for design, coding, and integration testing 


Table 3. Design Activity During Life-Cycle Phases 


Project Design Coding Integration Accept, test 
number phase (%) phase (%) test (%) phase (%) 


1 

41.8 

2 

53.6 

3 

33.3 

4 

45.3 

5 

17.4 

6 

58.9 

7 

63.9 

8 

28.1 

9 

61.8 

10 

57.8 

II 

58.7 

12 

58.9 

13 

60.5 

Average 

49.2 


33.9 

10.0 

31.2 

9.2 

37.1 

19.7 

32.6 

22.0 

69.1 

13.5 

30.7 

4.3 

15.3 

6.8 

56.9 

7.1 

38.2 

0.0 

27.2 

7.0 

13.7 

16.67 

32.8 

5.9 

24.7 

11.9 

34.1 

10.3 


14.3 
6.0 
9.9 
0.1 
0.0 
6.2 
14. 1 
8.0 
0,0 
8.0 

10.9 

2.4 

2.9 

6.4 


with the other category removed. As can be seen, design 
took about one-third of the development effort and 
varied between a low of 25% and a high of 47% -a 
factor of almost 2. On the other hand, coding took an 
average of 42% of the relative effort and varied between 
36% and 49%-a factor of only 1.36. Testing ranged 
from a low of 7.5% to a high of 39.5%, with an average 
of 22 % , for a relative factor of over 5. 

From Table 2, the “other” category was 29% of the ' 
effort on these projects, and of the 13 measured projects, 
other activities consumed more than one-third of the 
effort on six of them. The other category consists of 
activities such as travel, completion of the data collec- 
tion forms, meetings, and training. While these activities 
are often ignored in life-cycle studies, the costs are 
significant. Table 7 presents the distribution of other 


Table 4. Coding and Testing Activity During Life-Cyde 
Phases 


Project 

number 

Design 
phase (%) 

Coding 
phase (%) 

Integration 
test (%) 

Accept, test 
ph*se(%) 

I 

1.4 

78.3 

11.3 

9 1 

2 

0.0 

72.8 

19.7 

7.5 

3 

22.2 

56.2 

11.8 

9.8 

4 

16.4 

58.5 

25.1 

0.1 

5 

21.2 

68.7 

10. 1 

0.0 

6 

0.5 

77.3 

11.3 

10.9 

7 

1.3 

73.9 

15.6 

9.2 

8 

14.7 

54.7 

21.0 

9.7 

9 

5.2 

91.1 

3.1 

0.6 

10 

0.0 

73.0 

22.5 

4.5 

11 

2.2 

70.5 

20.1 

7.2 

12 

0.3 

74.8 

8.3 

16.6 

13 

4.6 

63.6 

26.9 

4.9 

Average 

6.9 

70.3 

15.9 

6.9 


5207 


2-38 



Resource Utilization 


335 


Table 5. Integration Activity Daring Life-Cycle Phases 


Table 7. Other Activities Effort in Each Phase 


Project 

number 

Design 
phase (%) 

Coding 
and unit 
phase (%) 

Integration 
test (%) 

Accept, test 
phase (%) 

Project 

number 

Design 
phase (%) 

Coding 
and testing 
phase (%) 

Integration 
test (%) 

Accept, test 
phase (%) 

j 

0.0 

^ 17.8 

27.4 

54.7 

1 

23.3 

32.2 

18.1 

26.5 

2 

0.0 

45.2 

30.1 

24.7 

2 

0.0 

9.1 

26.4 

64.6 

3 

6.1 

53.9 

21.1 

18.9 

3 

21.7 

47.8 

16.8 

13.7 

4 

21.0 

39.3 

39.7 

0.0 

4 

46.2 

30.2 

23.6 

0.0 

5 

28.4 

71.0 

0.6 

0.0 

5 

11.0 

67.7 

21.3 

0.0 

6 

1.0 

40.9 

17.6 

40.5 

6 

18.2 

44.2 

9.0 

28.7 

7 

0.5 

54.1 

26.3 

19.2 

7 

14.4 

51.6 

14.5 

19.5 

8 

2.9 

33.8 

19.2 

44.1 

8 

26.5 

47.7 

11.4 

14.4 

9 

0.0 

66.4 

29.2 

4.4 

9 

15.9 

65.5 

18.7 

0.0 

10 

0.0 

23.1 

41.5 

35.5 

10 

12.4 

30.2 

35,9 

21.5 

i 1 

0.0 

36.4 

35.1 

28.5 

11 

21.4 

32.2 

18.9 

27.6 

12 

0.1 

32.7 

22.4 

44.8 

12 

47.3 

46.6 

4.6 

1.5 

13 

1.5 

49.5 

28.8 

20,2 

13 

42.5 

30.0 

12.7 

14.9 

Average 

4.7 

43.4 

26.1 

25.8 

Average 

23.1 

41.2 

17.8 

17.9 


activities across ail phases. While such effort varies 
widely from project to project, no general trends can be 
observed, except that it does take a significant effort as a . 
percent of total costs. 

3. CONCLUSIONS 

Using data from the SEL database, it seems that the 
software development process does not follow the 
waterfall life cycle but appears to be more a series of 
rapids as one process flows into the next. Significant 
activities cross phase boundaries and do not follow 
somewhat arbitrary milestone dates. The classical prod- 
uct-driven model has many shortcomings. 

In the SEL environment, as well as elsewhere, other 
classes of activities take a significant part of a project’s 
resources. At almost one-third of the total effort, it 


Table 6. Relative Activity 


Integration 
act. (%) 

Project 

number 

Design 
act. (%) 

Coding 
and unit 
act. {%) 

l 

39.9 

37.5 

22.6 

2 

33.3 

43.7 

23.0 

3 

39.9 

30.8 

29.3 

4 

44.0 

46.3 

' 9.7 

5 

40.8 

46.8 

12.3 

6 

24.6 

35.9 

39.5 

7 

33.5 

42.8 

23.6 

8 

32.2 

40.7 

27.1 

10 

46.8 

45.7 

7.5 

i l 

37.8 

40.1 

22.1 

12 

25.2 

49.4 

25.5 

13 

38.6 

43.0 

18.4 

Average 

36.2 

42.2 

21.6 


might be part of an explanation of why software is 
typically over budget. Estimating procedures often use a 
work breakdown structure where the system is divided 
into small pieces and estimates for each piece are 
summed up. Inclusion of a significant “other” usually 
does not occur. 

Newer technology is affecting this traditional model 
even more. In one NASA experiment, a prototype of a 
project was developed as part of the requirements phase 
[8]. In this case, 33,000 lines of executable Fortran were 
developed at a cost of 93.1 staff- months— al ready a 
significant project in this SEL environment. When 
viewed as a separate development, the prototype had a 
life cycle typical of the data presented here, but if 
viewed as only a requirements activity it puts a severe 
strain on existing models. 

Current models do not handle executable products as 
part of requirements. Other questions arise: Are Ada 
package specifications design or code? Are executable 
specification languages specification or design? When 
does testing start? 

It is clear that our current product-driven models need 
to be updated. Other models, such as the spiral model, 
which is an iterative sequence of risk- assessment deci- 
sions, or value chaining, which addresses value added 
by each phase, are alternative approaches that need to 
enter our vocabulary and be further studied for effective- 
ness. 
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Abstract 

Software engineering is a challenging new application area for 
information bases The new challenges are twofold: bow 
software engineering processes and products can be properly 
modeled, and, bow such processes and products can be mir- 
rored naturally within an information base Meeting these 
challenges requires software engineering and information bane 
research Our "Meta Information Base for Software Engineer- 
ing* project at the University of Maryland represents such a 
joint research effort. The idea of our approach ts to generate 
customized software engineering information bases from formal 
specifications of software engineering processes and products. 
The three central research topics are: (i) develop a software 
process and product specification language which permits all 
the information necessary to understand, control and improve 
any given software engineering process, (ii) develop a meta 
information base schema which automatically generates an 
information base structure given a software process and pro- 
duct specification; and (iii) develop a mapping between the 
software engineering oriented and information base oriented 
models The generator approach acknowledges the fact that 
software engineering changes not only from environment to 
environment, but also from project to project If an informa- 
tion base is expected to truly mirror and support a given 
software engineering project, it needs to be tailorable to the 
changing characteristics of the software project itself. The gen- 
erator based approach suggested by our project seems to be the 
natural approach to satisfy this important need. 

This paper presents the information base oriented part of our 
joint project . It discusses how to represent a set of software 
process and product type specifications in a database and how 
to use these to automatically generate database support for 
process executions and product instances. 

Introduction 

When we began our research on a Meta Information Base for 
Software Engineering one of the first future research topics we 
identified was object-oriented database systems. However, as 
the reader may already have noticed, the words object-oriented 
were not mentioned in the title or abstract of this paper. While 
a number of object-oriented database systems have been pro- 
poned in the past few years [Dittrich 86|, there seems to be 

* W'f btv» tivo lubmitUd • divuMim th* lofiwtrf «n(in»erin| orifol+d rvt of 

ouf proj*fi for tb# •Softwv* Engioetriag Prwroti ModtU md Aoilyta* brock of 
tbit urn* cooftrroct [Rom bock ttj. 


little consensus as to what such a system should be. Although 
our research - will eventually lead to our version of an object- 
oriented database system, we are currently using and extending 
existing relational database technology, trying to find out bow 
far it will take us. Others, following the same approach, have 
extended the relational model with more semantics [Codd 70], 
provided better support for complex objects jDadam 86[, and 
extended the data definition capabilities with support for type 
inheritance [Borgida 88[. The common goal of these efforts, 
best described in [Carey 88j, is to use and extend existing rela- 
tional technology, but to retain < a powerful non-procedural 
query language. 

Our joint project is based on the following framework for a 
Software Engineering Environment [Rombach 88[ 



Framework for SEE 

This paper concentrates on the following two information base 
issues: 

the represent ion of a formal specification of a set of 
software engineering process and product type descrip- 
tions using an extended relational data model, and 
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- the automatic generation of information base support for 
software process executions and product instances based 
on tbeir type descriptions. 

As a basis for addressing the first issue we shall, in section 2, 
briefly summarize the set of requirements for a software process 
and product specification language (as described in our com- 
panion paper (Rombach 88]) In section 3, we introduce a 
graphical formalism for the extended relational model. In sec- 
tion 4 , we show the representation of formal specifications. 
The notion of a Self- Describing Database System [Mark 85), 
briefly described in section 5, is the basis for the generator 
approach discussed in section 6 

Requirements for a Software Process and Product 
Specification Language 

We distinguish between very general requirements which ack- 
nowledge the basic nature of software processes and more 
specific requirements [Rombach 88]. * 

In this section we shall restrict ourselves to listing the specific 
requirement# for a software process and product specification 
language from a planning perspective and from an exe- 
cution perspective. 

Since we want formal specifications in this language 
represented in our software engineering information base, the 
specific requirements, from a planning perspective, have direct 
bearing on the schema design of the information base. Simi- 
larly, since we want to automatically generate information base 
support for software process executions and product instances 
based on the formal specifications, the specific requirements, 
from execution perspective, also have direct bearing on the 
schema design. 

From x planning perspective, the specific requirements for 
the specification language include the ability to specify: 

1. process, product, and constraint types 

2. produce (output) and consume (input) relationships 
between product and process types 

3 control flow relationships between process types (sequence, 
alternation, iteration, and parallelism) 

4 . structural relationships between product types (sequence, 
alternation and iteration) 

5. dependency relationships between process types (Process 
Pi is dependent of process P2 if every execution of P2 
triggers a simultaneous execution of Pi. Typically, meas- 
urement processes are dependent on the construction 
processes which they are supposed to monitor.) 

6. pre-conditions and post -condition relationships between 
constraint and process/product types (A pre-condition of 
a process is a constraint imposed upon initiation of this 
process; a post -condition of a process is a constraint 
imposed upon termination of this process; a condition of a 
product is a constraint imposed upon this product.) 

7. aggregation and decomposition of process and product 
types 

8. generalization and specialization of process and product 
types 

9. constructive as well as analytic (measurement-oriented) 
product and process types 


10 different roles (Different roles are performed in a software 
project such as design role, test role, quality assurance 
role, or management role Roles define views or perspec- 
tives of (a subset of) the processes and products relevant 
to a particular project. Type and number of roles may 
change from project to project.) 

11 time (relative and absolute) & space (software structure, 
versions, configurations) dimensions 

12. dialogues between processes (including human beings) 

in section 4 we design the schema for the meia information 
base to meet most of these requirements 

The specific requirement# from an execution perspective 
for a software process and product specification language 
include the ability to handle 

1. the instaliation (creation of objects) of process, product 
and constraint types 

2 long-term, nested transactions (Many software engineer- 
ing processes such as designing may stretch over weeks or 
months, in addition, they may contain nested activities.) 

3. varying degrees of persistence (.Some information needs to 
be kept forever, some only for the duration of the project, 
and others only until a new instance (e g . product ver- 
sion) has been created.) 

4 . tolerance of inconsistency (Because of the long-term 
nature of software engineering processes, it might be 
necessary to store intermediate information that does not 
yet conform with the desired consistency criteria ) 

5. dynamic types (type hierarchies) (an object of type pro- 
duct (e g., a compiler developed during one project) may 
be used as an object of type process during a future pro- 
ject.) 

6. non-determinism due to user interaction 

7 dynamic changes of process specification types (It is 
impossible to plan for all possible (non-deterministic) 
results produced by human beings in advance However, 
we would like to react to those situations by dynamically 
re-planning during execution Although it is not a prob- 
lem to change a specification during the planning stage, it 
might be a problem to change the specifications during 
execution while preserving the current execution state.) 

8. back-tracking due to execution failures 

9. the organization of historical sequences of product objects 
and process executions 

10. the enormous amounts of interaction between parallel 
activities 

11. the role specific interpretation of facts (The same process 
and product facts might require different interpretation in 
the contexl of different roles.) 

12. the triggering of actions (based on pre-conditions and 
post -conditions) 

The list of execution point -of- view requirements is heavily 
influenced by the results of a working group during the 4th 
International Workshop on Process Specification (More- 
lonhampstead, l T K, May 1988), chaired by Tom Cheatham 
[1WSPS 88]. Some of these requirements will be meet by our 
automatically created information base, however many of them 
define topics for future research on information bases. 
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Graphical Formalism for an 
Extended Relational Model 

To create a schema for a meta information base that com- 
pletely and precisely mirrors the fundamental software 
engineering concepts, we need a powerful data model A data 
mode! consists of a language for defining data structures, a 
language for defining constraints, and a language for data 
manipulation and query processing. We shall primarily concen- 
trate on the data structure language in this paper This 
language is an extended version of the relational model with a 
number of concepts borrowed from semantic data models and 
object oriented data models. 

The language supports two kinds of uniquely named domains: 
lexical and non-lexical. 



Non-lexical domains (full circles) model object-sets and lexical 
domains (broken circles) model object-name-seis We shall 
almost entirely be using non-lexical domains in the 
specification The reason is that we concentrate on modeling 
the fundamental software engineering concepts and their rela- 
tionships, and postponing the aspects of how the concepts are 
lexically described, represented, and referenced In an imple- 
mentation the non-lexical domains will be represented by sur- 
rogates [Hall 76], which are system generated, internal, unique 
identifiers for objects. 

Since surrogate values are internal to the system and invisible 
to the user we are primarily modeling the invisible part of the 
meta information base, and ignoring the visible part Several 
important observations are related to the use of surrogates: 

}. Surrogates allow us to model aggregate objects fairly 
easily while preserving normalized relation representa- 
tions. Non-procedural query languages, like relational 
calculus, can be used for query processing without change. 
If a nested relation represent ation^Thomas 86] was used, 
allowing attributes with set values, we would violate first 
normal form representations and we would be forced to 
use a powerset calculus. 

2 Surrogates allow us to mode! generalization in a straight 
forward manner [Codd 79 <t however a slight generaliza- 
tion of relational calculus is needed if we want to utilize 
inheritance in queries. 

3. As far as a database system is concerned, anything stored 
as an instance of a lexical object type is primitive, and all 
the database system can do is insert, delete, or retrieve it. 
Surrogates are ideal for modeling the structure of new 
user defined object types, providing a means for extensi- 
bility. However, the complete structure of an object 
must be explicitly represented if the user wants to use the 
relational calculus to manipulate and answer questions 
about the structure of the object. 

4 . Breaking down everything to obtain an explicit represen- 
tation of the internal structure of objects may result in . 
inefficiency from a system point of-vjew However, 
current research on view cache and incremental computa- 
tion models show very promising results [Roussopouloe 
87]. Inefficiency from a user point -of view can basically 
be ignored because relational views can be used to define a 
higher level query interface when needed. 


An arrow between two domains represent an b_a relation 
type In the example below, the object type O has subtypes O 
ud °2 An •»-» r«l*tioo typ* represent i total function from 
the subtype to the supertype The set of is-a relation types 
define a directed acyclic graph on the set of domains Various 
rules for inheritance may be adopted, however, inheritance 
from multiple supertypes is hard to define properly IBorgida 
88j 



Relation types are uniquely named and are represented by the 
notation below. Attribute* model the roles of the 
corresponding domains in relations Attribute names may be 
omitted, in which case the corresponding domain name is used 
However, attribute names must be unique withm relations 

Identifier constraints (double beaded arrow under an attribute 
combination) model partial functions from an attribute combi- 
nation to each of the other attributes in the relation 



Rather than using relational normalization, we aim at identify- 
ing atomic facts. Multiple atomic facts may later be combined 
into larger relations while preserving at least Boyce-Codd Nor- 
ma) Form (BCNF) As is customary in object-role data models 
we shall model at! concepts in terms of domains The role of 
the relations is therefore reduced to capture the aggregates 
that form the concepts and to relate the concepts An impor- 
tant advantage of our specification language over the tradi- 
tional relational data definition language is that it clearly indi- 
cates that only attributes over the same domain can be used as 
a basis for entity joins between relations The relational model 
traditionally only supports domains of primitive types and does 
not support a strong typing concept. 

Although some types of aggregate objects, e g abstract syntax 
trees, could be conveniently represented by recursively defined 
relation types, whatever that is, we have not considered such 
an extension of our model because databases have & bard time 
managing instances that are not all of the same structure and 
size. 

Our approach clearly allows us to use the relational calculus 
for data manipulation and query processing. 

A significant advantage of this is that a powerful constraint 
definition capability may be based on the relational calculus 
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Although important, we shall postpone further discussion of 
the query language and the constraint capability to a later 
paper. 

Representation of Process and Product 
Specifications 

Before we start, let it be perfectly clear that we are dealing wtth 
three levels of information. What we are about to design in 
this section is a schema - or rather a mela-schema - that 
describes all process and product descriptions that can be 
defined in fAe specification language introduced in jRombach 
88 ]. 

The data stored under this schema are , in other words, process 
and product descriptions and can in furn be interpreted as the 
acAemo for process executions and product instances under 
these descriptions. This issue is discussed in sections 5 and 6 

The two fundamental concepts in the specification language are 
process descriptions and product descriptions 

Process descriptions and product descriptions are modeled by 
the two domains shown below. 



Instances of process descriptions and product descriptions are 
tied to their type through the insert operation Therefore, a 
■member* relationship need not be modeled explicitly. 

We use the concept process recursively in two ways. 
First, a process description may be an aggregate of a set of 
component process descriptions In an aggregation we form a 
concept from existing concepts. The phenomena that are 
members of the new concept’s extension are composed of 
phenomena from the extensions of the existing concepts. 
Second, a process description may be a generalization of a more 
specific process description. In a generalization we form a new 
concept by emphasizing common aspects of existing concepts, 
but out special aspects. The phenomena that are members of 
the existing concepts are all members of the new concept, and 
they therefore inherit all the attributes of the members of the 
new concept Aggregation and generalization are classical 
themes in object oriented databases [Smith 77J. 

The aggregate process descriptions are modeled below. A 
process description may be reused in many aggregate process 
descriptions. Aggregate process descriptions may have multiple 
levels, but cannot be defined recursively, (i.e. a process 
description cannot contain itself as a component at any level). 
This constraint is not modeled below. 

As modeled by the second relation type below, some but not all 
process descriptions may have names We have modeled these 
names to be universally unique. Other models are of course 
possibile. 



Within aggregate process descriptions, the component process 
descriptions may be aequeDti&l, alternative, parallel, or 
iterated Only process descriptions that are parts of an aggre- 
gate process description can be used in any of these ordering 
schemes Since process descriptions may be reused in many 
’ aggregate process descriptions, the ordering must be aggregate 
process description specific. Our approach is to model the res- 
trictions imposed by the ordering schemes Since parallelism is 
not a restriction we need not model it Sequence is, for con- 
venience, assumed to be represented in a relation where the 
tuples are ordered on the aggregate process descriptions and 
subsequently on the component process descriptions The order 
of the component elements will, of course, depend on the lexi- 
cal representation of their names since it makes no sense to 
order on the non-lexical surrogate values Iteration will simply 
be modeled as a "goto". 




The generalized process descriptions are modeled below 
Notice that a process description may be in more that one gen- 
eralization, (i.e., we model a generalization net rather than a 
generalization hierarchy) However, the generalization net can- 
not contain cycles; this constraint is not modeled below This 
model will provide the information needed to support any 
inheritance scheme we may want to adopt. 
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To complete the two recursive definitions, we must model the 
fact that an aggregate process description and a generalized 
process description are themselves process descriptions. 











This model will provide the information needed to support any 
inheritance scheme we may want to adopt. 

We use ibe concept product recursively in two ways. 

First, a product description may be an aggregate of a set of 
component product descriptions Second, a product description 
may be a generalization of more specialized product descrip- 
tions. 

As modeled by the second relation type below, some but not all 
product descriptions may have names We have modeled these 
names to be universally unique Other models are of course 
possibile. 





To summarize, we have now modeled how aggregate and gen- 
eralized process and product descriptions can be defined from 
other process and product descriptions Since process and pro- 
duct description instances are tied to their respective type by 
insertion, we can summarize our complete model (the dashed 
arrows indicate "member" relationships that are maintained 
through insertion) 
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All the non-lexica! domains are represented by surrogates 
Any information about the objects modeled by these surro- 
gates, including their lexical representation, will be connected 
to the surrogates. 


The fundamental relationship between process executions and 
product instances is that a process execution uses a set of 
product instances as input and produces a set of pro- 
duct instances as output To model this at the process and 
product description level, we need the following relation The 
i/o domain consists of the values -Ci, o, io} 


We model the generalized product descriptions below As 
with generalized process descriptions, a product description 
may be reused in more that one generalization, (i e., we model 
a generalization net rather than a generalization hierarchy). 
However, the generalization net cannot contain cycles, this con- 
straint is not modeled below 



This model will provide the information needed to support any 
inheritance scheme we may want to adopt. 

To complete the two recursive definitions, we must model the 
fact that an aggregate product description and a generalized 
product description are themselves product descriptions. 



)W 



Some software methodologies require detailed i o information 
for each element in a document rather than for the document 
as a whole. This requirement is supported by our model 
through the use of the recursive definition of process and pro- 
duct descriptions. 

The concept of mapping is introduced to allow process and 
product descriptions in a project using one software methodol- 
ogy to be compared to process and product descriptions in a 
project using a different software methodology We must pro- 
vide data structures that help the software engineer define 
mappings between process and product descriptions in different 
software methodologies. We mode! a rudimentary mapping 
definition capability below. 
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Before a mapping can be defined, we may have to use tbe 
recursive process and product definition capability in order to 
bring tbe concepts we want to compare to the tame level of 
abstraction Once this is done we cm use the mapping 
definition capability. 


The notion of ua«r views is very important User views are 
needed for managers, designers, programmers, etc. In general, a 
user view is defined as a consistent collection of product and 
process descriptions together with a collection of product 
instances and data about process executions that conform to 
tbe descriptions and are relevant to a particular project. 



The notion of a process pre-constraint and post- 
constraint on a database is as important as tbe notion of a 
control mechanism in software engineering Since different 
pre-constraints and post-constraints may apply to the same 
process deception used in different aggregate process descrip- 
tions, we have to tie the relationship between process descrip- 

tiss procMS de9cri p- 



Like static constraints in a database, we think about the 
notion of product constraint as something independent from 
the processes that use and produce the product. We therefore 
model product constraints as follows: 



We have introduced a large number of database constraints 
between the model of process and product descriptions and the 
instances of these descriptions Tbe most natural way of main- 
taining consistency between the surrogates in an aggregation 
and generalization hierarchy is through the use of a well 
defined set of operations for insertion and deletion 

Maintaining consistency between the lexical representa- 
tions is a much more complicated problem Fortunately, 
part of this problem has a very elegant solution 

To control the consistency of lexica) representations we 
only store the lexical representations of the atomic process 
and product descriptions, an object is atomic if it is not defined 
as an aggregate or a generalization Lexical representations of 
aggregate and generalized process and product descriptions 
should merely refer to tbe other aggregate and generalized 
process and product descriptions and to the atomic process 
and product descriptions directly used in their descriptions To 
avoid storing multiple almost identical copies of atomic process 
and product descriptions, we shall investigate incremental file 
representation techniques where a new file which is an almost 
identical copy of an existing file is represented by a pointer to 
the existing file plus a file differential Techniques of this 
nature are discussed in [Roussopoulos 87j. 

The lexical representation of non-atomic objects can be 
materialized through the use of relational views. 

Based on the above discussion we can now model tbe storing 
of lexica] representations of atomic process and pro- 
duct descriptions. What these lexical representations look 
like, will of course depend on which language we choose for 
their representation. 

Currently available database management systems do not 
directly support the storing of large, variable size, unstructured 
lexical objects A possible but not very desirable solution is to 
develop a program that stores these objects on files under 
operating system control and stores addresses of tbe files under 
database control. 

We model the lexical representation of atomic objects as fol- 
lows: 






o 



A version normally refers to an object that is almost identical 
to another object. In our mode], the concepts of process 
description and product description can be used to model the 
notioo of version, and we shall not introduce versions as a 
separate concept. 


A configuration normally refers to a collection of versions. 
Again, the concept of configuration will not be introduced as a 
separate concept because it can be modeled by the concepts 
already defined. 
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Thf concept of meuuremenU has recently been the subject 
of considerable attention in software engineering Measure- 
ment can be perceived as a product instance or a process execu- 
tion A measurement can be part of a product instance or a 
product instance in its own right, or measurement can be part 
of a process execution or a process execution in its own right A 
measurement can therefore be described by or as part of a pro- 
duct description, or it can be described by or as part of a pro 
cess description. Therefore, we shall not introduce measure- 
ment as a new concept. 

Process executions and product instances have several time 
attributes associated with them Examples are the actual start 
and end- times of process executions and the actual time of 
creation of product instances Examples of time attributes for 
process and product descriptions are time of creation and last 
time executed and .instantiated Other time attributes are 
defined on a relative time scale, (e g., one process execution 
must preceed another one). 

Time attributes are, however, examples of measurements, and 
we shall therefore not introduce the time concept explicitly at 

this stage. 

The purpose of this section has been to provide a formal 
schema definition that completely and correctly mirrors funda- 
mental concepts in our process and product specification 
language independently of their lexical representation. 

The next step is to define the lexical object-name-aels that will 
allow us to reference and represent the concepts It is very 
important to understand that the information base is com- 
pletely blind with respect to the internal structure of the 
object-names; it cannot see, use, or maintain any internal 
structure of object-names, (e g an object-name-set may con- 
sist of a set of Ada programs, but they all look like text strings 
to the information base) This implies that the maintenance of 
any structure of or constraints between object-names is the 
sole responsibility of the users and software tools accessing the 
information base. 

Self-Describing Database Systems 

A Self-Describing Database System is unique in that it pro- 
vides an active and integrated data dictionary as part of 
the database management system Such a data dictionary sys- 
tem is essential in our system. 

The architecture of a Self-Describing Database System is illus- 
trated below, [Mark 85[. This architecture has recently been 
adopted by the ANSI SPARC [Burns 86[ as the basis for ft new 
Reference Model for database management systems, and it is 
the basis for current work in the ISO. 
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The core DBMS supports the well-known point-of-view 
dimension of data description which consists of internal, con- 
ceptual, and external schemata. In addition, it supports and 
enforces the in ten aion -extension dimension of data descrip- 
tion The intension-extension dimension has four levels of data 
description Application data are stored as data The 
application schemata, describing and controlling the use of 
the application data, are stored in the data dictionary The 
rule* for defining, managing, and controlling the uae of 
the application schemata are stored in the data diction- 
ary schema A fundamental set of rules for defining sche- 
mata, (i.e a description of the data models supported by the 
Self-Describing Database System), is defined in the meta- 
acbema The set of rules in the meta-schema will allow the 
management strategies represented in the data dictionary 
schema to evolve in accordance with changing data manage- 
ment policies Each level of data description in the intension- 
extension dimension is the extension of the level above it, and 
the intension for t he leve l below it The meta-schema is self- 
describing, fi e. it is one of the schemata it describes) 

The core DBMS can be thought of as a DBMS stripped to the 
bones It supports the Data Language. DL, which is the onl> 
language used to retrieve and change data and data descrip- 
tions at any level in the intension-extension dimension The 
DL provides a set of primitive operations on any data element 
or data description element at any Jevel in the intension- 
extension dimension of data description Any compound opera- 
tions needed must be implemented as a tool in the Data 
Management Tool Box using the primitive operations of the 
DL. Data Management Tools are plug -compatible with the 
core DBMS through tJeDL. 

The basic idea behind our generator approach is to make tbe 
schema designed in section A part of the data dictionary 
schema above. By doing this, th e proc ess and product descrip- 
tion instances created through this schema will be stored as 
data in the data dictionary These data may in turn be inter- 
preted as an application schema controlling process executions 
and product instances in a specific software engineering project 
The data describing these process executions and product 
instances will therefore be stored as part of the application 
data. 

To make this work, the semantics of insert operations used 
through the data dictionary schema must guarantee that 1) 
process and product descriptions are inserted in the data dic- 
tionary, and 2) data structures are created at the application 
data level to hold process execution and product instances con- 
forming to these descriptions. 

The Generator Approach 

To understand the philosophy behind the generator approach 
we will consider the data dictionary schema (catalog) of a self- 
describing database system. 

One of the most important things contained in tbe data dic- 
tionary schema is a relation of relations Simplified, it looks 
some) hing like this: 

As can be seen, the first few tuples in the relation of relations 
contain a definition of that relation itself (its is self-describing) 
and other relations describing the relational data model. The 
next set of tuples define the first two relations we defined in 
section 4, namely •aggregafe^procrss^descript* and 
■ processed esc ripl_n ameV 
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Whenever, a set of tuples is inserted into this relation of rela- 
tions, the semantics of the insert operation further trigger the 
creation of an empty structure to hold the extension (data) of 
the defined relation. This means that the insertion of the 
tuples defining ■aggregate_proeess_descript" and 
•process descript name* in the above example will result in 
the creation of an empty structure in the data dictionary to 


bold the extension of these relations. 


Let us now turn our attention to these empty structures. 
When we insert tuples in them we are inserting data that in 
turn can be interpreted as defining an application schema. 



In the illustration above we have inserted a set of tuples that 
constitute an aggregate process description The aggregate pro- 
cess description defines the development (dev) process to con- 
sist of analysis (anal), design (des), specification (spec), and 
implementation (imp!), and it defines the design to consist of 

high-level design (blades) and low-level design (II des) The 

p-values are surrogates produced by the system 

These aggregate process description will result in the creation 
of two empty structures at the application data level, one for 
the aggregate development process and one for the aggregate 
design process Into these empty structures we can store data 
about specific executions of the defined development and design 
processes 

We could continue the example by 1) inserting into the relation 
■aggregate^ process_ descript_ seq* tuples defining the 
sequence of the processes in the development process 2) insert- 
ing into the relation "aggregate^ product^ descnpt" tuples 
defining the products relevant to the development process and 
3) inserting into the relation "process^ product^ i, o* tuples 
defining which processes use and produce which products. 

However, the example we have given is hopefully sufficient to 
illustrate the idea. 

Conclusions and Future Research 

Ideally, the information base for software engineering described 
in this paper will provide support for the automatic generation 
of an information base from a formal specification of a set of 
process and product descriptions In order to further develop 
this idealized information base, more research in the areas of 
software engineering and databases is required Although we 
only list the major database research issues, we strongly believe 
that success in this research area will depend on the tight 
cooperation between the two areas (The software engineering 
research issues are listed in jRombacb 88j). 

Future Database Research Issues 

There is currently no data model, let alone & database manage- 
ment system, capable of supporting a meta information base 
for software engineering One of the goals of our research is to 
develop the concepts and tools that are mi'ssiDg For now, 
we are taking a very conservative approach, trying bo 
uae and extend existing relational database technology 
to see how far it will take us. There are especially two 
things from existing relational technology that we would like to 
preserve a non -procedural calculus query language, and a con- 
straint definition capability based on this calculus. We see no 
conflict between preserving these and at the same time provid- 
ing a more object oriented data manipulation interface between 
the software engineering oriented model and the database 
model on which we have concentrated in this paper. 

Providing an object oriented data manipulation 
interface between * the software engineering 
oriented model and the database oriented model 
will be our next major research topic. 

We plan to use the insert, delete, and update operations 
provided in the relational calculus to program transac- 
tions that will allow us to create, aggregate, decompose, 
generalize, specialize, and delete process and product 
descriptions in a consistent way. 
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Ail information baae for software engineering must ideally be 
adaptable to meet the needs for continuously tailoring software 
engineering processes and products to changing project needs 
and characteristics of the project environment and the organi- 
sation. 

Ad important research issue is therefore the han- 
dling of data when iU corresponding schema 
changes. The self-describing database system pro- 
vide an ideal framework for investigating this 

bsse. 

Given a formal specification of a programming language, a 
document form, etc., it is theoretically possible to automati- 
cally produce the schema needed to explicitly represent the 
internal structure of all objects produced according to the for- 
malism, is it practical? 

A major research question Is where the “invisible” 
part of the database ends and tbe “visible* part 
begins. Our approach to this question is to try to 
push the existing database technology as far as 
possible. 

However, we will eventually have to face the problem of com- 
plex lexical object types. 

A major research problem is therefore the support 
of extensibility which allows for user defined lexi- 
cal object typos* 

Two possible solutions, representing the main streams in object 
oriented database research, are to provide tool access to com- 
plex lexical objects through the query language or to store user 
defined operations on complex lexica) objects in tbe database 
The difference between the solutions is minor. 

A long list of additional research problem can be derived from 
the list of specific execution requirements presented in section 
2 . Many of these research problems are discussed in jBernstein 
87 j and are not repeated here. 
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Software Engineering is a challenging new application area for 
information bases. The new challenges are twofold: software 
engineering specific (how can we model software engineering 
processes and products properly?) and information base specific 
(how can we mirror such processes naturally within an information 
base?). Meeting these challenges requires joint software 
engineering/information base research. The Meta Information Base 
project at the University of Maryland represents such a joint 
research effort. This project aims at generating customized software 
engineering information bases from formal specifications of soft- 
ware engineering processes and products. The three central 
research topics are to develop (i) a software process specification 
language which allows us to capture all the information necessary 
to understand, control and improve any given software engineering 
process, (ii) an object oriented information base schema language 
which allows us to model the mirroring information base structure 
for any such software engineering process, and (iii) a mapping 
between the software engineering oriented and information base 
oriented models. If an information base is truly expected to mirror 
a given software engineering process, it needs to be tailorable to 
the changing characteristics of the software process itself. The 
generator-based approach suggested in our project seems to be the 
natural approach to satisfy this important need. Software process 
and product specifications are expected to have not only an impact 
on generating customized software engineering environment com- 
ponents (such as information bases). Systematic improvement of 
software processes and products - learning about software engineer- 
ing approaches and reusing software engineering related experience 
- can not be achieved without having a specification of the objects 
we want to improve. This paper discusses general requirements for 
software process specification languages, presents a first prototype 
software process specification language, demonstrates the applica- 
tion of this language and derives software engineering related 
requirements for a supporting information base. The actual efforts 
aimed at implementing these information base requirements are 
briefly mentioned in the conclusions. 


1* Introduction 

Lessons learned from having monitored the software development 
and maintenance process over a decade (l, 11] suggest a high-level 
improvement oriented software engineering model consisting or 
planning, execution, and learning & feedback stages [4]: 

• Planning the software engineering process is aimed at defining 
plans for developing quality a priori. It includes choosing the 
appropriate overall process model as well as the specific methods 
and tools supporting this process model. It involves tailoring 


each of them for the project specific goals and the characteristics 
of the project environment and the organization. Process 
models, methods and tools need to be planned for construction as 
well as learning and feedback. The effectiveness of this planning 
process depends on the precision in the specification of the pro- 
cess models, methods and tools (formal is better than heuristic) 
and the experience concerning their effects The entire planning 
process as well as the tailoring process need to be formalized 

• Execution of the software engineering processes follows the 
plans derived during planning; the existence of construction 
guidelines helps in assuring that process models, methods and 
tools are being used as intended. It should be noted that execu- 
tion includes the construction of the traditional project docu- 
ments (e.g. requirements, design, code) and all other kinds of 
information prescribed by the planning process (e g., test results, 
schedule, effort data), as well as the analysis of the construction 
processes and resulting products from various (during planning 
prescribed) perspectives. 

• Learning and feedback follows the plans defined during plan- 
ning. Learning is in part based on the analysis results derived 
during execution of processes (e g., regarding the use of process 
models, methods and tools) as well as products. We compare the 
actual results with the desired results, and feed the lessons 
learned back into the ongoing project (which might result in 
iterating the project plans) or into the planning of future pro- 
jects. Feedback is important to engineers and managers. An 
effective feedback mechanism is especially crucial for supporting 
the complex management decision process. 

Software engineering processes need to possess the attributes tailor- 
able and tractable. Tailorability is required in order to plan the 
software engineering process for the project specific goals and pro- 
ject environment characteristics. Tractability is required in order to 
specify processes in an understandable way, construct products 
according to these plans, and monitor the construction for the pur- 
pose of feedback and learning The TAME (Tailoring A Measure- 
ment Environment) project at the University of Maryland aims at 
the development of a measurement, feedback and planning 
environment for software engineering [4], Part of this project is to 
develop & software engineering information base. The development 
of a process and product specification language (although neces- 
sary) is not pari of the current scope of the TAME project. 

Our Meta Information Base project project at the University of 
Maryland represents a joint software engineering/information base 
research effort. The basic idea of this approach is to generate cus- 
tomized software engineering information bases from formal 
specifications of software engineering processes. The three central 
research topics are to develop (i) a software process specification 
language which allows us to capture all the information necessary 
to understand, control and improve any given software engineering 
process, (ii) an object oriented information base schema language 
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which allows us to model the mirroring information base structure 
for any such software engineering process, and (iii) a mapping 
between the software engineering oriented and information base 
oriented models. The generator approach acknowledges the fact 
that software engineering processes change from environment to 
environment, but also from project to project. If an information 
base is truly expected to mirror^ given software engineering pro- 
cess, it needs to be tailorable to the changing characteristics of the 
software process itself. The generator- based approach suggested by 
our project seems to be the natural approach to satisfy this impor- 
tant need. Generating customised software information bases is 
not the sole application of software process specifications. We are 
also investigating the benefits of software specifications for the pur- 
pose of better understanding, planning and improvement of soft- 
ware engineering related aspects. We believe that learning abou t 
software engineering and reusing software engineering related 
experience can not he done in a systematic way without specifying 
the objects of learning and reuse - the software processes - them- 
selves. In order to do a good job of learning and reuse, measuring 
and analyzing the software processes and their effects seems to be a 
very helpful mechanism. We therefore suggest not only to model 
the construction oriented software engineering aspects, but also the 
analysis oriented ones. 

Based on our improvement oriented TAME software process model, 
we anticipate the following framework for supporting software 
engineering processes (see figure 1): 



Figure 1; Framework for SE Process Support 

Each software engineering project consists of a planning and execu- 
tion stage. Chiring the planning phase plans (specifications) of all 
project relevant processes and products get developed; the execu- 
tion stage consists of conducting the project according to these 
plans. The underlying information base stores all process and pro- 
duct plans as well as the information derived during execution of 
these plans. The plans themselves provide the basis for structuring 
the execution-derived information. Storing such information across 
projects results in historical information bases. Improvement can 
then be achieved by structuring this information appropriately 
(based on process plans), and reusing it during the planning and 
execution phase of future projects after tailoring it to the specific 
characteristics of these future projects. Figure I suggests that we 
need to specify software processes and products for different pur- 
poses: lo support the planning activities at the user interface, to 
allow the internal representation of plans, and to support the 
storage and retrieval of plans and information derived during exe- 


cution according to plans. In our project, we expect to use three 
different (but compatible) specification languages in order to satisfy 
the different needs of each perspective. 

This paper presents the software engineering oriented part of our 
joint project . It discusses general requirements for software pro- 
cess specification languages, presents first prototype software 
specification languages (one to support the planning activities at 
the user interface, one to represent plans internally), demonstrates 
the application of these prototype languages, and derives software 
engineering related requirements for a supporting information base. 
The information base related work of our project, aimed at imple- 
menting these software engineering oriented information base 
requirements, is not part of this paper. 


g . Rtffuir , gm?nts fpr Software Proces s & Product Specification 

We distinguish between very general requirements which ack- 
nowledge the basic nature of software processes, and more concrete 
requirements whose relative importance depends on the purpose of 
software process and product specifications. 

General requirements for a software process specification lan- 
guage include the ability to 1 

1. specify ail aspects that seem to be important within a given soft- 
ware project (and not to be limited to a specific set of aspects) 
This requirement acknowledges the fac^ that there exist no com- 
monly accepted software process models todayr="-~- - :;v 

2. specify with varying degrees of detail and to refine initial 
specifications in the future as w« learn: This requirement ack- 
nowledges the faet that our understanding of some processes is 
insufficient, of others is pretty precise, 

3 deal with creative and mechanical aspects of software processes 
in different ways (e.g., behavioral specifications for creative 
aspects and algorithmic specifications for mechanical aspects) 
This requirement acknowledges the fact that software processes 
include both creative and mechanical aspects, and that we must 
deal with both in a natural way: 

4. easily modify process specifications: This requirement ack- 
nowledges the constant need for tailoring process specifications 
to changing project or environment needs. 

Specific requirements (from a planning perspective) for a 
software process and product specification language include the 
ability to specify 

1. process, product, and consiraint _ lypea^ " 

2. use (input) and produce (output) relationships between process 
and product types 

3. (pre- and post-) condition relationships between constraint and 
process/product types (A pre-condition of a process is a con- 
straint imposed upon initiation of this process; a post- condition 
of a process is a constraint imposed upon termination of this pro- 
cess; a condition of a product is a constraint imposed upon this 
product.) 

4. control flow relationships between process types (sequence, alter- 
nation, iteration, and parallelism) 

5. structural relationships between product types (sequence, alter- 
nation and iteration) 

6. dependency relationship between process types (Process Pi is 
dependent on process P2, if every execution of P2 triggers simul- 
taneous execution of PI. Typically, measurement processes are 
dependent on the construction processes they are supposed to 
monitor.) 

7. aggregation and decomposition of process and product types 


* We present another paper discussing the information base oriented part of our 
project during the Database Formalisms, Software ft Systems* session of this 
same conference (10}. 
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8. generalization and specialization of process and product types 

9. constructive as well as analytic (measurement oriented) product 
and process types 

10. different roles (Different roles are performed in a software pro- 
ject such as design role, test role, quality assurance role, or 
management role. Roles define views or perspectives of (a subset 
of) the processes and product*- relevant to a particular project. 
Type and number of roles mafchange from project to project.) 

11. time (relative and absolute) &. space (software structure, ver- 
sions, configurations) dimensions 

12. non-determinism due to user interaction 

Specific requirements (from an execution perspective) for a 
software process and product specification language include the 
ability to handle 

1. the instantiation (creation of objects) of process, product and 
constraint types 

2. long-term, nested transactions (Many software engineering 
processes such as designing may stretch over weeks or months; in 
addition, they may contain nested activities.) 

3. varying degrees of persistence (Some information needs to kept 
forever, some only for the duration of the project, and others 
only until a new instance (e g., product version) has been 
created.) 

4. tolerance of inconsistency (Because of the long-term nature of 
software engineering processes, it might be necessary to store 
intermediate information that does not yet conform with the 
desired consistency criteria.) 

5. dynamic types (type hierarchies) (an object of type product (e.g., 
a compiler developed during one project) may be used as an 
object of type process during a future project.) 

6. dialogues between processes (including human beings) 

7. dynamic changes of process specification types (It is impossible 
to plan for all possible (non-deterministic) results produced by 
human beings in advance. However, we would like to react to 
those situations by dynamically re-planning during execution. 
Although, it is no problem to change a specification during the 
planning stage, it might be a problem to do it during execution 
and preserve the current execution state.) 

8 back-tracking due to execution failures 

9. the organization of historical sequences of product objects and 
process executions 

10. the enormous amounts of interaction between parallel activities 

11. the role-specific interpretation of facts (The same process and 
product facts might require different interpretation in the context 
of different roles.) 

12. the triggering of actions (based on pre- and post-conditions) 

The list of execution-point-of-view requirements is heavily 
influenced by the results of a working group during the 4th Inter- 
national Workshop on Process Specification (Morelonhampstead, 
UK, May 1988), chaired by Tom Cheatham [12]. 


& Prototype Process & Product Specification Languages 

Several research projects are working towards improving the soft- 
ware development process from various perspectives: Arcadia [13], 
TAME [2, 3, 4], GENESIS [15], and others [12]. No consensus 
seems to be reached as to what an appropriate specification lan- 
guage should look like in order to be both capable of describing the 
important process and product aspects and acceptable to the 
intended user. 

We believe that no single specification language will satisfy the 
needs of software engineers as well as the designers of the informa- 
tion base. Based on our SEE model in figure 1, we believe that 
there is a need for at least three different language representations: 


• the application level language, which is used to support the 
task of specifying the relevant process and product aspects dur- 
ing the planning stage (at the user interface of our SE process 
model in figure 1). This type of specification language should 
accommodate the needs of its potential users (e.g., software 
engineers, managers). 

• the intermediate level language, which is used to represent* 
the results of the planning stage. This type of specification lan- 
guage should emphasize completeness, consistency, and precise- 
ness. Complete in this context means executable, independent of 
whether this execution requires user interaction or not 

• the information baae level language, which is used to formu- 
late the storage and retrieval needs of software processes and 
products. These needs encompass the process and product 
specifications themselves as developed during the planning stage, 
as well as the information accumulated during the execution of 
those plans during the execution stage This kind of language is 
usually referred to as schema language. 

In addition, we need to provide for transformations between adja- 
cent language levels. The application level language representation 
of a particular software engineering process or product (e g the 
design process) eventually needs to be transformed into the 
appropriate information base level language representation 
(schema). This transformation must preserve consistency. The 
intermediate level language representation can be looked at as a 
reference representation acceptable to both the software engineer- 
ing and information base perspective. The separation provides 
independence of application and information base representations 
and it allows us to to separate the entire research area into two 
clearly distinguished but connected (via the intermediate level) 
areas. Ideally, these transformations should be automated, this 
would allow us to completely hide the information base view from 
the software engineer and vise-versa. 

In the following two subsections, we introduce first prototype lan- 
guages for the application and intermediate level. 

3.1. A Prototype Application Level Specification Language 

Our prototype process and product specification language for the 
application level is graphically oriented. At this point it provides 
graphical elements satisfying the first eight specific planning 
oriented requirements listed in section 2: 

1. Three kind* of object types: process types (represented 
as boxes), product types (represented ms circles), and con- 
straint types (represented as rhombs). 

Figure 3.1(a): Object Typea 

The concept process is used for all kinds of software engineering 
activities. It comprises the elements of our high-level software 
engineering model (planning, construction, learning and feed- 
back), overall software process models such a $ the "water fall” 

[7, 16], "iterative enhancement” [5] or "spiral" [8j model, com- 
plex methodologies such as the "Cleanroom* [9] methodology, 
particular methods and tools such as "top-down design" or 
"Jackson design”, and even individual statements of an 
automated tool. 

The concept product is used for all kinds of software engineering 
information. It comprises the plans for construction and learning 
and feedback produced by the planning process of our high-level 
software engineering model, deliverable products produced by 
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the construction process such as requirements, design, code, but 
also test data, schedule, resources, and all kinds of measurement 
data. 

The concept constraint is used to represent all kinds of software 
engineering conditions (pre-conditions for the execution of a pro- 
cess and post-conditions which, are checked at process termina- 
tion time). Constraints may Uso be imposed on products. Con- 
straints are used to model schedules, completeness criteria or any 
other kind of quality or productivity characteristic. Constraints 
are basically expressed as boolean expressions. 

2. Two kinds of relations between process and product 
types: the use relation (represented as a solid arrow con- 
necting a product and a process type) and the produce 
relation (represented as a solid arrow connecting a pro- 
cess and a product type) 



Figure 3.1(b): Use/Produce Relations 

In figure 3.1(b), process of type Pi uses products of type IPl and 
IP2 and produces products of type OPl, 0P2, and 0P3. 

The relations vie and produce are used to explicitly express all 
kinds of information needed for executing a process and resulting 
from its execution. Used information can range from experience 
(for example, in the form of historical data), to products pro- 
duced during the same project by other processes, products pro- 
duced during prior projects, and characteristics of the project 
and project environment. Produced information can range from 
deliverable products (e g. design or code documents) to measure- 
ment data or even new process and product descriptions based 
on learning. 

3. Three kinds of relationships between constraint types 
and process or product types: pre-condition t post- 
condition, condition (represented as solid double arrows). 



Figure 3.1(c): Constraint Relations 


In figure 3.1(c), constraint of type cl is a pre-condition for a pro- 
cess of type Pi; constraint of type c2 is a post-condition for a 
process of type Pi, constraint of type c3 is a condition of a pro- 
duct of type P2. 

The eontiratnt relationships are used to explicitly express all 
conditions that need to be fulfilled^Before start or after termina- 
tion of a process, but cannot be expressed via use/produce rela- 
tionships or explicit control flow relationships between processes. 


Examples are schedule, and all kinds of quality and performance 
requirements. In addition, constraint relationships are used to 
express expected characteristics ©r a product; e g., maximum 
complexity. 

4. Four kinds of control flow relations between process 
types: the sequence, alternation, Iteration and parallelism' 
relations (represented as solid arrows between process 
types; parallel control flow is indicated through the aug- 
mentation of the corresponding arrows with *||*). 

The semantics of sequential control flow is obvious. The seman- 
tics of alternate control flow is to execute exactly one of the 
alternatives. The selection criterion can be expressed in terms of 
a pre-condition on each of the alternative processes. Alternation 
is completely deterministic if each of the alternate processes 
possesses a pre-condition and all pre-conditions are mutually 
exclusive. It is possible to have nondeterministic alternation (no 
constraints) or incomplete alternation (no alternative applies 
under certain circumstances). The semantics of iterative control 
flow is to execute some process repeatedly. The negated termi- 
nation criterion is provided in form of a pre-condition to the 
iterated process. It is possible to specify indefinite Iteration (no 
termination constraint) The semantics of parallel control flow is 
to execute all parallel processes independent of each other How- 
ever, all of them must be completed in order to satisfy the paral- 
lel control flow relation. 



Figure 3.1(d): Control Flow Relations 

In figure 3.1(d), process of type Pi is in sequence with process of 
type P2, processes of type P5 and P8 are alternatively executed 
after P4, process of type P3 is iteratively executed, and processes 
of type P8 and P9 are executed in parallel (independently). The 
decision whether to execute P5 or P6 can be based on two 
mutually exclusive pre-conditions C5 and 08. 

Note: The graphical symbols for data and control flow are dis- 
tinguished by their context. Arrows representing data flow con- 
nect processes and products, whereas arrows representing control 
flow connect just processes. 

Three kinds of structural relations between product 
types: the sequence, alternation, and iteration relations 
(represented in the same way as control flow between 
process types). 

The relation ’sequence’ indicates the sequential composition of 
two products; the relation ’alternation* indicates the alternate 
inclus|on of either of two^ products, and the relation, ’iteration’ 
indicates the repeated occurrence of a product (0 or more times). 
We use the same relation names to express the control flow com- 
position of process types and the structural composition of pro- 
duct types to minimize the number of concepts. 

6. A dependency relationship between processes 
(represented as dotted double arrows between two 
processes). 
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Figure 3.1(e): Dependency Relation 
In figure 3.1(e), Pi is dependent on P2. 

The dependency relationship is used to express a very tight 
form of parallelism between processes. The relation is directed 
and defines a mastei^slave relationship in the sense that when- 
ever the master process is in execution, the slave process gets 
executed too. This means more than just to start end terminate 
at the same time; it means absolutely synchronized execution 
This concept allows us to model the measurement of software 
processes. For example, if we have a design process and we 
would like to collect all the effort spent on designing, we mode! 
the design process as the roaster process and the effort measure- 
ment process as the slave process. 

7. A relation between process or product types allowing for 
decomposition and aggregation: the ia__part_of relation 
(represented as dashed arrows augmented with the rela- 
tion name) 



Figure 3.1(f): Decomposition/ Aggregation Relations 

In figure 3.1(f), process type Pi is decomposed into (and com- 
pletely substituted by) process types Pll to Pin. Product type 
P2 is similarly decomposed into product types P21 to P2m. 

We need to allow for decomposition and aggregation of process 
and product types. The decomposition is necessary to describe 
the refining of some process or product into more precise (less 
abstract) processes or products. 

Note* Decompositions are level complete. This means, if a pro- 
cess type Pi is decomposed in process ‘types Pll, P12, P13, and 
Pi 4 (see (6)), then these four processes together make up the 
entire functionality of Pi (they entirely substitute Pi )! 

For example, the overall process "development" might be refined 
into "requirements analysis", "design", "coding", etc.; similarly, 
we can refine the product "deliverables" into products "require- 
ments document", "design document", "source code documents* , 
etc. 

Decomposition is also necessary in order to reflect the hierarchy 
of product structure. For example, "system" might be recursively 
decomposed into "subsystems", "components" and "modules*. 

We use the concept process recursively in two ways. Each pro- 
cess type can be decomposed into lower-level process types or can 
be included into the aggregation of higher-level process types. 
This use of the term process can reduce the difference between 
an informal method and a concrete automated tool supporting 
this method to a difference in the degree of formalism in the 
specification. Whereas the method might be described in infor- 
mal English, the tool might be the complete algorithmic formali- 
zation of the same process. The second possibility of using pro- 


cess types recursively is specialization and generalization. In the 
case that one method can be automated by a variety of tools 
alternatively, we can view the tools as specializations of the 
method, or the method as a generalization of those specific tools. 

We use the concept product recursively in the same two ways as 
processes. 

The relations sequence, alternation, iteration and parallelism (see 
4.) are used in the context of decomposing and aggregating pro- 
cess or product types. 

The semantics of these relations in the context of a process type 
decomposition is as follows: Each decomposed process type either 
(a) inherits the entire set of use and produce relations of the 
aggregated process types, (b) inherits parts of the use and pro- 
duce relations of the aggregated process types, (c) uses product 
types produced by a different decomposed process type and pro- 
duces product types to be used by a different decomposed process 
type, or all possible combinations of (a), (b), and (c). According 
to (2), each decomposed process type requires at least one pro- 
duct type for use and production. The functionality of the aggre- 
gated process type is identical to the functionality achieved by 
ail decomposed process types if executed according to their con- 
trol flow relationships. 

8. A relation between process or product types allowing for 
specialisation and generalisation: the is_a relation 

(represented as dashed arrows augmented with the rela- 
tion name) 



Figure 3.1(g): Specialisation/Generalisation Relations 

In figure 3.1(g), each of the process types Pll to Pin is a special- 
ization of process type Pi, and each of the product types P2l to 
P2m is a specialization of product type P2. Pi is a generaliza- 
tion of each of the Pll to Pin and P2 is a generalization of P21 
to P2m. 

We need to allow Tor specialization and generalization of process 
and product types. Generalization of a set of process and pro- 
duct types allows to group them according to some common 
aspect. 

For example, we can generalize compilers for all kinds of lan- 
guages to a general compiler process that translates an algo- 
rithmic source code document into object code Another example 
is viewing all tools supporting a specific method alternatively as 
specializations of that method. 

In addition we satisfy the specific planning oriented requirements 9 
and 10 as follows: 

0. We encourage the awareness of constructive and ana- 
lytic process and product aspects. Again, ihe control flow 
and structural relations defined for the graphical notation allow 
for representing all kinds of decompositions and aggregations. 
The success of software projects depends on a sound integration 
of constructive and analytic aspects as indicated by our high- 
level software engineering model. This fact does not mean that 
we should not view them as different aspects. 

The constructive aspects are concerned with generating products, 
while the analytic aspects are concerned with secondary informa- 
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tion derived from monitoring *nd analysing constructive 
processes and products. 

10. We allow for the definition of different role*. Roles are 
defined as projections onto the set of process and product types 
defined for some project. They define specific views or perspec- 
tives Different views may include the same process or product 
type. For example, the desigrrTble and the quality assurance role 
may both be interested in the design product, but from very 
different perspectives. Whereas the design role is interested in 
how he can build the design product best, the quality assurance 
role is emphasizing the adherence of the design product to stated 
quality requirements. One role may be performed by several 
people, or one person may execute several roles The number of 
roles is not predefined, but rather project specific. Roles are 
defined explicitly In practice, different roles will very often be 
specified by people with different project experience. 

We believe that the concepts and principles presented in this sec- 
tion provide a promising basis for building process and product 
specification languages. The objective is to be able to specify all 
aspects of a software process or product (completely), according to 
a set of unifying principles (consistently), and to the level of detail 
possible due to the nature of the problem and our understanding 
(precisely). 

fr.2. A Prototype Intermediate Level Spec- L anguage 

The intermediate level specification language is specified in BNF- 
style. Appendix (A) contains the syntax rules necessary to specify 
process types. The necessary context rules are not included in 
Appendix (A). 

This language allows us to specify a given process type (or product 
type) at any desirable and possible level of detail. Each process 
type specification consists of a procct 9 jteading and a proce99_body. 
The processjieading describes the unique proc ess_ly pe_narne, 
whereas the process_body contains the actual specification. The 
processJ>ody consists of a proct9i_tpcctfica tion_part, a 
ToU_ipecification _part, and a rcto%rcc_a*signmcnt_j>ort. The 
process_specifi cation j>art contains an tnterface^part, a 

rcjinement_fart t and an implcmentaitonjpart The interface_part 
characterizes the used and produced product types and the 
attached constraint types. The refinements art describes the 
refinement of this process type into lower level processes (includes 
also the refinement of the related products and constraints) and 
defines their connections at this lower level. The implementation 
part contains the final algorithmic implementation of a process 
type. Refinement and implementation parts exclude each other; 
either a process type gets refined further or it is at its final level of 
detail. Refinement and implementation parts are optional. The 
role_specificaiion_part defines all roles. Roles can be viewed as 
’super* processes. The resource_assignment - part assigns resources 
to processes and/or roles. The resources are specified like product 
types. This includes the ability to refine them. If we have several 
organizations of people involved in executing a certain process, we 
can model each organization as a resource consisting of people 
resources. The product body of product type specifications consists 
wily of a refinement _part and implemcntaUon^part. 

This prototype language allows us to satisfy the planning oriented 
specific requirements listed in section 2. The current language 

definition is by no means final. We plan on using it as an experi- 
mental vehicle allowing us to validate whether the chosen concepts 
are satisfactory for specifying all kinds of process and jiroduct 
related software engineering aspects. 


4. Application of the Prototype Specification UngUMM 

The validity and usefulness of our software engineering process 
model depends on whether we are able to (a) generate specifications 
for all kinds of process and product types using our languages, (b) 
make project personnel use the specification languages during plan- 
ning as well as the generated specific models during execution and ** 
learning and feedback, and (c) generate an information base sup- 
porting the planning of process and product types (store and reuse) 
and the execution of instantiations of process and product types 
(e g., instantiation itself, storing information accumulated during 
execution). 

So far we have been able to specify a number of process and pro- 
duct types using our graphical notation The specified process 
types include a variety of existing project models (e g [17]) as well 
as specific development methods. Th« completely automated ver- 
sion of a process type is a tool. We would be able to represent any 
structured implementation of a tool using our control flow rela- 
tions. 

The answer to part (b) requires more work. It seems that our pro- 
cess model and languages" will be useful during planning for 
describing aspects of construction and learning and feedback as 
well as the consumed and produced products completely, precisely 
and as formal as possible It should also help execution and learn- 
ing and feedback in that it should be easy' to follow these kinds of 
complete, consistent and precise plans. The degree to which execu- 
tion can be supported will depend on the degree to which we will 
be able to satisfy the specific, execution oriented requirements 
listed in section 2. 

Our initial answer to part (c) is presented in [lOj. 

In this section we will apply out two prototype specification lan- 
guages to a small example. We will introduce the example in sec- 
tion 4.1, demonstrate the use of the graphical prototype application 
level, language in section 4 2, and show how the final plans can be 
represented using our prototype intermediate level language in sec- 
tion 4.3. 


The example we have chosen to demonstrate the applicability of 
the two prototype languages is a subset of the design related 
aspects out of the context of a larger project. 

The example can be characterized as follows: 

Specify a process type for the design phase (named ’design ) that 
consumes a requirements product type (V) and produced a design 
product type (’d’). The design process type consists of two sequen- 
tial design sub processes for high-level design fhLdesign’) and 
low-level design (’llj desi gn*) We want to use methods for high- 
level design (’yourdon*) and low-level design (’pdl 1 ). The design 
process will start on date t_l, and has to be completed by date t_3 
t_l+3 High-level design should be completed by date t_2 =* 
In addition, the actual effort spent for high -level design 
(’hleff 1 ) and low-level design (’llefT), the number of low-level design 
errors (’Herr'), and McCabe’s complexity of the low-level design 
products (V) must be measured for quality assurance purposes. A 
low-level design product will not be accepted if its complexity 
value exceeds 20. The design process will be performed by five 
people. One person is assigned to perform the high-level design. 
Three people, including the person who performed the high-level- 
design, are assigned to perform the low-level design. A fourth per- 
son is assigned to perform the quality assurance activities; a fifth 
person is assigned to manage the project. 


We apply our graphical notation to specify all aspects of the exam- 
ple described in section 4.1 except the assignment of people. The 
sequence of specification steps is not predefined We have chosen to 
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specify the example according to the three identifiable roles 
(designer, quality assurance, manager): 

1. DESIGN ROLE: 

1.1. Specification of the use and produce relationships associated 
with process type ’design': ^ 



Figure 4.2(a): Specification of design process and products 

Figure 4.2(a) describes our initial specification of the example 
described in section 4.1. 

1.2. Decomposition of process type ’design’ into ’hLdesign’ and 
’Il^design’ (with sequential control flow) and decomposition of 
product type *d’ into ’hid’ and ’lid’ (with sequential structure): 



Figure 4.2(b): Decomposition of ’design 1 and ’d’ 

Figure 4.2(b) describes the decomposition of process type 
’design’ into process types ’hLdesign’ and ’ll_design\ The 
control flow between ’hLdesign’ and 'U_design' is sequential, 
the aggregation of product types ’hid 1 and ’lid’ into.’d’ is 
sequential. 


1.3. Decomposition of ’hLdesign’ into ’yourdon' and ’lid’ into 
’pdT: 



Figure 4.2(e): Decomposition of ’hl_design’ and ’Il_destgn’ 

Figure 4.2(c) describes the decomposition of process type 
’hLdesign’ into ’yourdon’ and product type ’lid’ into ’pdl’. The 
decomposition relation is used to describe this refinement; in 
addition, we could also use the specialization relation to indi- 
cate that ’yourdon* is a specific instance of ’hLdesign’ and 

’pdronidv 

2. QUALITY ASSURANCE ROLE: 

2.1. Specification of measurement oriented process and product 
types: 


Figure 4.2(d): Specification of measurement processes 



Figure 4.2(d) describes the measurement of ’hlerr’ via process 
type ’counLerrors’, 'hlefT via process type ’count_hLeff\ ’llefT 
via process type ’countJLeff’. and the McCabe complexity V 
via process type ’compule_v*. The process types ’coun terrors’, 
’count^hljcffort’ and ’eounlJl_effort’ are dependent on process 
types ’yourdon’, ’yourdon’ and ’ll_design’, respectively. 

2.2. Specification of constraints: 





Figure 4.2(e): Specification of process and product constraints 

Figure 4.2(e) describes how the constraint types c_l, cJ2, and 
c_3 (which use the boolean expressions ’calendar_time = tl\ 
’calendar_time <-» t2\ and 'ca]endar_time <«■» t3 ) are 
assigned as pre-condition to ’yourdon’, post-condition to ’your- 
don’, and post_condition to 'HLdesign ’ respectively. The con- 
straint type c_4 which uses the boolean expression T v (pdl) > 
20* is assigned as a pre-condition to ’ll_design’ to indicate 
another iteration. 

3. MANAGEMENT ROLE: 

4.1. Specialization of ’hLdesign 1 and ’lid 1 : 


hi dwica 

~lf rT 


b a 


your doe 



Figure 4.2(f): Specialisation of ’hLdesign’ 

Figure 4.2(f) describes how the specific design method ’your- 
don’ can be categorized as a specialization of the process 
’hLdesign*. There exist other possible specializations, e g , the 
object oriented design method ’ood*. 

4.3. Specification of the Example (Intermediate Level) 

In this section we give m example as to how the specification infor- 
mation produced using .he application level language in section 4.2 
is represented internally using the intermediate level specification 
language. Each of the objects (process and product types) men- 
tioned in section 4.2 is represented by a separate intermediate level 
specification. However, each specification will combine all the infor- 
mation that relates to a specific object completely, independent of 
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the sequence in which it had been created. 

As u example, we give the specification of the process type 
’design’ This specification includes the refinement of design into 
Tj! design’ and ‘^design’ (step 1.2 in section 4.2.), and all related 
quiity assurance aspects (see steps 2.1. and 2.2. m section 4.2.). 
The intermediate specificationjof this scenario is contained m 
Appendix (C). 

The interface^ art contains product types V and ’d’ as well as 
constraint types ’O’ and 'tj\ The next refinement level is 
described in terms of decomposed types (see decomposiUon_pyt), 
imported process, product, and constraint type specifications see 
use oart) and relatione between all those types (see 

conn ec ti OD_part) . The is used in the decomposition^^ to 
indicate that a non-determined number of processes of type 
*11 design’ needs to be instantiated (for each module one). Each of 
Uilse instantiations will produce a product of type lid The 
role specification .part identifies all roles «eordmg to the way 
intonation was provided at the user interface level (see section 
4.2.). In the resource_assignment_part, people resources (p_l, 

P 5) are assigned to execute certain roles (’quality _assur an ce_role , 
’management _roie’) and/or process types {’hl_design\ ’!l_design’) 

4 ,4. St>eciflc*tion of the Example (InfeESLrttoP Sw 


The example in APPENDIX (C) is only one of the specifications to 
be stored in a supporting software engineering information base. A 
complete list of specification objects according to our example is 
contained in Appendix (B). Remember, these objects comprise 
only the planning part of what needs to be stored tn an informa- 
tion base. In addition, the information base must be capable of 
storing all the information derived during execution of these pro- 
cess type specifications. 


$ Deriving Informat ion Base RequirCTP*Iitg 

The role of software engineering information bases is to mirror the 
software processes and products relevant to a project or entire 
environment. Assuming our improvement onented software pro- 
cess model (consisting of planning and execution stages for each 
project), a supporting information base needs to be capable of stor- 
ing the process plans as well as the execution derived information 
In our case, plans are the intermediate process specifications intro- 
duced in section 3.2 and demonstrated in section 4.3. These plans 
could also provide the necessary information for organizing the exe- 
cution derived information. Obviously, in the case of our example 
in section 4, we would like to see all the plans listed in Appendix 
(B) stored in an information base. 

A list of important requirements for designing an information base 
interface are identical with the requirements (general and specific) 
listed in section 2. Additional requirements can be found in [6, 14). 
The specific planning oriented requirements that are expected to 
allow us to specify all aspects of software processes and products 
seem not to be the problem as far as the information base is con- 
cerned. It is not clear at this point, whether all the execution 
oriented specific requirements can be easily satisfied with state-of- 
the-art database technology It is not even clear, whether all these 
execution-oriented requirements should be dealt with inside a per- 
sistent database at all. Our first approach to generating a software 
engineering information base from process and product 
specifications is described in [10]. 


fl. Current St»tua and Future Work 

The specification research goal of our project is to develop a formal 
language for specifying all aspects of software processes and pro- 
ducts in a complete, consistent and precise way We do not believe 
that all aspects can be formalized in an algorithmic manner. How-* 
ever, we believe that even those creative aspects can be described 
as integrated into the overall software development and mainte- 
nance process; this integration would make them accessible to con- 
trol to a certain degree. We have devel oped first prototype lan- 
guage definitions and have them manually applied to specify small 
but realistic software engineering scenarios. This limited experi- 
ence seems to indicate that the concepts chosen for our languages, 
are promising. We need to further experiment with these languages 
and refine them. We believe that feedback from a variety of peo- 
ple is essential in order to improve them incrementally. It is how- 
ever, not realistic to expect other people to apply our languages 
manually. Therefore, the most important next step is to prototype 
both languages. Out of the list of twelve planning oriented specific 
requirements, we are least satisfied with our solution to represent- 
ing the relationship between time and space dimensions Our 
current specification approach seems to be too static. It is not pos- 
sible to convey to a user the fact that, e g., in the case of our 
example in section 4, we have to instantiate the low-level design 
process for each module that has been identified during high-level 
design (or even for each person and module 1 ). Other important 
research aspects are related to the execution-related specific 
requirements listed in section 2. Most of all, we have to come up 
with a good mechanism for instantiating process and product 
types. 

The information base research goal of our project is to develop an 
(eventually) object oriented information base interface supporting 
the planning and execution stages of software projects The design 
of this information base interface is inspired by the software 
engineering oriented requirements. Eventually, we would like to 
generate customized information bases from process and product 
specifications. We have developed a first approach For mapping 
process and product specifications into a information base schema 
jlp]. We have implemented a first prototype information base 
(based on relational database technology). This prototype will be 
used as a vehicle for validating and improving our approach For 
the future, it is planned to integrate this prototype into the proto- 
type of the measurement and evaluation system TAME [2, 3, 4j. 

The major future research issues besides refining and automating 
our prototype specification languages are to effectively support the 
reuse of process speci fi cat ions, their tailoring to new project needs, 
the different roles of a single process specification (role specific 
interpretation of Tacts), and all the execution oriented specific 
requirements listed in section 2. 


7. Conclusions 

We are aware of the huge dilemma between the need for specifying 
software processes & pro duc ts and the unsatisfactory de gree of 
knowledge how to do it properly. Understanding software processes 
better is necessary for making progress in software engineering. 
Being able to specify a process is the fundamental basis for sound 
understanding, training, execution, control and improvement as 
well as generating appropriate automated support. 

Our two prototype specification languages reflect our current 
understanding of how to capture the important process and pro- 
duct related aspects. We believe strongly that the only way of 
improving our current understanding is experience from practical 
application. This requires us to have some initial language nota- 
tion. This statement should clarify the fact that we do not view 
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these initial language definitions as being final. They represent a 
vehicle for further learning The initial applications (one of which 
is described in this paper) have already helped us in understanding 
important process specification issues as well as in giving us a sense 
of the potential and limitations of our approach. 

We will continue refining our languages based on experience. We 
are especially interested in usings such process specifications as a 
basis for generating customized environment components, e g. soft- 
ware engineering information bases (10) We hope that this paper 
will inspire other groups involved in process specification research 
as well as result in feedback from those groups regarding our initial 
approach. 
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APPENDIX (A): Intermediate Level Language Definition; 


> START 'LANGUAGE GRAMMAR' 

SR.I: <plaa> <pr*cvM_pL»o > | <pr»du<n j>laa > 

> START 'FROCESS.PLaN GRAMMAR': 

SR_? 

> START PROCESS.PLAN .HEADING GRAMMAR' 

SR.l PROCESSPLaN <»«b«i>i> 

> START 'PROCESS.PLAN.BODY GRAMMAR 

SR.4: <pr*cm.«p«rtf<kl4*a_pvi> <r*it_tp«n£e»ii*n_p»n > 

<r*«*urcv _rp«cifie»t>a*_p«rt > 

> START 'PR0CESS_PLaN_SPECIFICaT10N_PART CRAMMAR' 

SR.S: <pTOTC»_«p»ciflc»tl«D_p*n> — PROCESS.5PECIF1 CATION J»ART 
<ctU(»> j»n> 

> START 'PROCESS_$PECtnCAT10N J NTER P A CE.PaRT CRaMmaR ; 

SR.* <prac*Mjat«rf*«t_pkrt> INTERFACE.PART: 

<(•■■«««> <c*MUM_p»rt> <pr*dwe* _p*n> 

<r«onru&i jtA > 

SR_T: <noniHij)in > CONSUMES _**■•>«> 

SR.i: PRODUCES. <K»4«*n_irp*_ua«J*t> 

SR-*' CONSTRAINTS: Ctwiniu.int.uM.iil > 

> START TROCESS.SPECiriCATTON.RErrNEMENT.PART GRAMMAR':' 

SR.I*: Jk*«> REP1NEM ENT.P ART: 

<«•■!»> jun > 

<d-ro ■ po< k>*a_p*rl > 

<c*BB«rtiam > 

SR.I 1 •:« U3E_PART <Up*r _ urn 

SR.I* uNjtart.bady > — <libr»ry_ij>« > ] Kbbrirjjirp* > <•?+< w* jui | M d 7 > 

| <*«*!! > 


SR.I1: j>*n> -- DECOMPOSITION _P ART <d«ro spout »e >art_Wdy > 

SR_H: CdtrtBpoHttM^irt.Wr > 

<c »Qjir u&i^dcrooipomioa j»*n > ( 

SR.I* | 

<pro<-TM_drreup« niMB >. <prKM.dKtap«iw*i jurt > 

SR.I* <pr*«(Mjliriapuiti|i>::« <pr»rrM m * > DECOMPOSES.! N TO 
<pfKw.rtBKrMn > 

SR.I?: <pf*tm.r»iKr*fl > <SUl.LAN.l > 

SR.I*: <peo<l*wn_d«>e»nip*utio»_p*n> <prodwct_d*f»*p««iittB> | 

<pr*dMt.d«’ia|»tki«( >, <pr»dvA.d«»»p«Mi4Djin> 

SR.lt: <pr**tut_d*ro spowtluo > <pradun _>»** > DECOMP OSES .IN TO 

<prNiMt.e«uirMi> 

SR.**: <pr«d\ut.f wnnm > ;• <SU1_LAN_I > 

SR.tl: <f»aoiroiat.d*r»spe»kici j>ort > <t*utriiM,dfr«ap«iAi*i > | 

<t»Mtrta>.dtr»ap w ii iw >, <c»Miriiai_d«c«Mp*»'ti*B > 

SR.**: Ccoaoir »Jot_d^o»p«ok»oa > <twnniaiJTpt g ua«> DECOMPOSES. IN TO 

<c«anrtiil_(«MnMl> 

SR.il <c*utrti«.t*i«trwt> <3U» JwANJ > 
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/ •**•****« _IM* > 


mm> BJNPUT.TO 

OrWwt.w* _**»•> IS.OUTPU7_FROM <KW«.W^w«>! 


JH.I# < yfW WiJWBi 


niu > — <SUB-LaNGUAGE_I> 


-> START TROCtSS.SPECiriCA-nONJMPLEMSNTATION.PART GRAMMAR'. 


SR Mr <pr*rwm imptW^Utm _part> lMPL£MEWTAT!ON_PART: 
SR_||: <i«pk»-H»tf**_p*«.b^p>::- <SUS_LANCUAOI.I> | <m!I> 
> START 'PROCESS.PLAN.ROLX.SPSCinCATION.PART GRAMMAR': 


APPENDIX (B): Process and Product Specification Types 
{'Example of section 4.1 


f.i, 

».1; 





APPENDDC (C): Process Specification of process ’design 1 
^Example of section 4.1. ): 


SR_I1 <rfli.ipoiaW«j«»> <«••«> 

SR I1 <rtkj»ri> <r*k > j 0*W_pa*> | <mS > 

JR It <nk> <r*k_b**d**> <r*M_Wdp> 

M.u 0*W >» dr > ROLE 

SR_Sf : <r»W.W4j > 

SR.S7: n— INTERFACE; <iaMffa«t_U«> 

SR_S« <iawrf***_U*» > ::<• <1»brary_iyp*> ( 

sR.»t <jikf «7.«rp* > - > | <i»«4iM*_trp«_puM> j 


immmml): PROCESS.PLAN 
COMMENT: 

PROCESS SPECIFICATION PART: 
COMMENT: 

INTERFACE_PART: 

OOMMENT: «•*••*••••• 

CONSUMES: r; 

PRODUCES: 4; 

CONSTRAINTS: ej. e.I; 

REFINEMENT PART: 
COMMENT: 


SR .40: 


• _pa/t> i < |n w .ifft.M > 


L> START 'PR0CESS.JLAN .RESOURCE ^ASSIGNMENT.? ART GRAMMAR'; 

SR.4I <rwum.iHf IMU jvt > <(»«■«> <rw»R.wjm> 

<mwrn.iMi|iW»iJt't> 

SR_Sf : < nt*«Wt .ttWJWt > ::■» USE.PART. > 

SR .41: < Httww .wtjtft.M4y> ::«• > ; RESOURCE f <mH> 

SR_44 <rw*»m.»i»«|aB«» j»rt> ASSICNMENT.PART: 

<rmurn.wi|AHtit jtrt.b*47 > 

SRJI: <ftt«BTt.iHi|w«*t jtrt.Miy > - <r m«m. MM«M 4 M > j 

<rmurn.Hiipani > 

<r tt wrw.w»i|iiMWjtfi> i <amM> 

$R.(« <ratvTT.«H^»«l> ::*■ Crtaaarw.iypt JMI> tS_ASS1GNED_TO 
<r*t«.iyp*_aM«> I 

Owm.irR.UM> tSj<SSICNn>_TO <pr»«tt_irpaj»«> 


i* 

SR_47: <(«■■*»!> OOMMENT: <Wmt> 

SR'tl: <trtct«Jirpt.ua<.lM> — <pr*c«t#_iyp«_aa»* > ] <t«t*a.‘ 7 PA«*>i 
< pc MM*, ty **_■*•*_!»•» > I <®«*U> 

SR_tt <pn4tt(t.l}ftjta«.l)n> — <prtdtrt.ijr»»j«w>l <pf*M«.t;pt.i»»»>i 
<pr*durt iypf.a»at.liit> | <aul > 

SR_M: <t»»twMat > I <c* **»■*»*_» TP* _»»»•>; 

I <wl> 

SR.I1 : <nwwtt.tyN.*MitJin> ; — <rw»rt«.irp#.M«t> i 
0‘f*wjr«_typ*. *»■*>«> [ <*«#> 

SR .13 Crtk.ijp* _■*•*> ! <f*k_t7P*_pa**>1 

<r •It.irpt.a* > | <a«4> 

IR.il > : “ <»«t> 

SR.lt <prtd«rt.t 7 p».MW> -- <M«> 

SR.U <»*rtrii«.iypt ji«t> <‘«‘> 

SR Si; <mtufrt_t)fpt.»»»t> <t«»> * 

SRI*?: <r»k.irp*_awt> <»— »> ^ 

SR IS: <SUB LANGUAGE I > Rtp»I#r tiprmwi ItafuM* prtdurt »rP* 

(*.!■* SEQUENCE, ALTERNATION ITERATION, x>d PARALLELISM* 

SR.il: <JUB J-ANCUaCE_ 3> A*j Lied *f tirrnal *f r » «i f4SfUt(> nprwtttHiti 

a/ltalay tb* tncriptlca *f t pr«<n< iapWatniiUM. 

SR M: <Ml>7- 


USE^ART 

cwiat.arrtrt, nut U dm, not ll.dtrt, eaapatt.* PROCESS .PLAN, 
bkrr. kid. lid, *: PRODUCT .PLAN; 
c J, < t(a*»*| CONSTRAINT 

DECOMPOSITION PART: 

tnlp DECOMPOSES INTO U„d*ai«», I.Mfi ; 
d DECOMPOSES INTO bid. M ; 

CONNECTION PART: 
bl d«*B USES t, 
bTd»i*« PRODUCES bM. 

It dm«a PRODUCES Bd, 
t*u*\ «rrtr> PRODUCES bl*rr, 
tMU bi «S»n PRODUCES bM. 
rtUBl It tffart PRODUCES k«, 
napvu.t USES Ud. 
rtaptM.f PRODUCES*. 

U d*..*o SEQ (PARALL (H d*Mf» )). 
rout imr, DEPENDS ON bl dm«a. 
cwiat’bi DEPENDS ON U d*«fa, 
c.u#t_a _»B*n DEPENDS ON It dwf», 

ITER (|ll_d.tifa,c_t)). 
t J !S_PRE_CONDIT!ON_FOR bl_d«*%». 
c I IS POST CONDITION FOR tl d#M«a. 
tj IS POST CONDITION.F OR bl_d*M*a. 
e t IS PRE CONDmON.FOR ll_d— ,a; 


IM PLEM ENTATION.PART: 
COMMENT: •»«*«*•«« 


ROLE SPECIFICATION .PART: 

OOMMENT 

4-.(B_n*U ROLE 

INTERFACE r, bM, Id. c.l, « J, cj, e.t 
ACTIONS: bl.dt«ta. B.dtwca 

Moruxt f*ir ROLE 

INTERFACE tj, c_3. e_i, tjt, W*rr, Udl, HcB, * 
bi_df*«(B. Rd 

ACTIONS rouAl.trrtrp, c*Ma»_U_«S*n, e*aat_U_M*rt, 

ctaptu.r 

aio*|rattt r*k: ROLE 
INTERFACE. 

ACTIONS: 


RESOURCE jkSSIGNMENT.PART: 

OOMMENT: 

USE J ART: 

p.l, p_*. P_«. »_4. P_» RESOURCE 

CONNECTION PART: 

p i IS >SSICNED TO bI.dw«B. 
p i IS ASSICNED.TO B.drttfa, 
p 3 lS>SSICNED_TO 
p 1 ISjt.SSICNEO.TO 

pt IS ^ASSICNED.TO aualky.tMarwa.raM, 
pi ISjASSICNED.TO MPtifHi/tb, 
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The TAME Project: Towards Improvement-Oriented 

Software Environments 

VICTOR R. BASILI, senior member, ieee, and H. DIETER ROMBACH 


Abstract— Experience from a dozen years of analyzing software en- 
gineering processes and products is summarized as a set of software 
engineering and measurement principles that argue for software en- 
gineering process models that integrate sound planning and analysis 
into the construction process. 

In the TAME {Tailoring A Measurement Environment) project at 
the University of Maryland we have developed such an improvement- 
oriented software engineering process model that uses the goal/ques- 
tion/metric paradigm to integrate the constructive and analytic aspects 
of software development. The model provides a mechanism for for- 
malizing the characterization and planning tasks, controlling and im- 
proving projects based on quantitative analysis, learning in a deeper 
and more systematic way about the software process and product, and 
feeding the appropriate experience back into the current and future 
projects. 

The TAME system is an instantiation of the TAME software engi- 
neering process model as an ISEE (Integrated Software Engineering 
Environment). The first in a series of TAME system prototypes has 
been developed. An assessment of experience with this first limited pro- 
totype is presented including a reassessment of its initial architecture. 
The long-term goal of this building effort is to develop a better under- 
standing of appropriate ISEE architectures that optimally support the 
improvement-oriented TAME software engineering process model. 

index Terms— Characterization, execution, experience, feedback, 
formalizing, goal/question/metric paradigm. Improvement paradigm, 
integrated software engineering environments, integration of construc- 
tion and analysis, learning, measurement, planning, quantitative anal- 
ysis, software engineering process models, tailoring, TAME project, 
TAME system. 


I. Introduction 

E XPERIENCE from a dozen years of analyzing soft- 
ware engineering processes and products is summa- 
rized as a set of ten software engineering and fourteen 
measurement principles. These principles imply the need 
for software engineering process models that integrate 
sound planning and analysis into the construction process. 

Software processes based upon such improvement-ori- 
ented software engineering process models need to be tai- 
lorable and tractable . The tailorability of a process is the 
characteristic that allows it to be altered or adapted to suit 

Manuscript received January 15. 1988. This work was supported in pan 
by NASA under Grant NSG-5123. the Air Force Office of Scientific Re- 
search under Grant F49620-87-0130, and the Office of Naval Research un- 
de* Grant N00014-85-K-0633 to the University of Maryland. Computer 
time was provided in part through the facilities of the Computer Science 
Center of the University of Maryland. 

The authors are with the Department of Computer Science and the In- 
stitute for Advanced Computer Studies. University of Maryland, College 
Park, MD 20742. 

IEEE Log Number 8820962. 


a set of special needs or purposes [64]. The software en- 
gineering process requires tailorability because the over- 
all project execution model (life cycle model), methods 
and tools need to be altered or adapted for the specific 
project environment and the overall organization. The 
tractability of a process is the characteristic that allows it 
to be easily planned, taught, managed, executed, or con- 
trolled [64]. Each software engineering process requires 
tractability because it needs to be planned, the various 
planned activities of the process need to be communicated 
to the entire project personnel, and the process needs to 
be managed, executed, and controlled according to these 
plans. Sound tailoring and tracking require top-down 
measurement (measurement based upon operationally de- 
fined goals). The goal of a software engineering environ- 
ment (SEE) should be to support such tailorabie and tract- 
able software engineering process models by automating 
as much of them as possible. 

In the TAME (Tailoring A Measurement Environment ) 
project at the University of Maryland we have developed 
an improvement-oriented software engineering process 
model. The TAME system is an instantiation of this TAME 
software engineering process model as an ISEE (Inte- 
grated SEE). 

It seems appropriate at this point to clarify some of the 
important terms that will be used in this paper. The term 
engineering comprises both development and mainte- 
nance. A software engineering project is embedded in 
some project environment (characterized by personnel, 
type of application, etc.) and within some organization 
(e.g., NASA, IBM). Software engineering within such a 
project environment or organization is conducted accord- 
ing to an overall software engineering process model (one 
of which will be introduced in Section II-B-3). Each in- 
dividual software project in the context of such a software 
engineering process model is exeucted according to some 
execution model (e.g., waterfall model [28], [58], itera- 
tive enhancement model [24], spiral model [30]) supple- 
mented by techniques ( methods , tools). Each specific in- 
stance of (a part of) an execution model together with its 
supplementing methods and tools is referred to as execu- 
tion process (including the^construction as well as the 
analysis process). In additlbn, the term process is fre- 
quently used as a generic term for various kinds of activ- 
ities. We distinguish between constructive and analytic 
methods and tools. Whereas constructive methods and 
tools are concerned with building products, analytic 
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method and tools are concerned with analyzing the con- 
structive process and the resulting products. The body of 
experience accumulated within a project environment or 
organization is referred to as experience base. There exist 
at least three levels of formalism of such experience bases: 
database (data being individual products or processes), 
information base (informatkm being data viewed through 
some superimposed structure), and knowledge base 
(knowledge implying the ability to derive new insights via 
deduction rules). The project personnel are categorized as 
either engineers (e.g., designers, coders, testers) or man- 
agers. - 

This paper is structured into a presentation and discus- 
sion of the improvement-oriented software engineering 
process model underlying the TAME project (Section II), 
its automated support by the TAME system (Section III), 
and the first TAME system prototype (Section IV). In the 
first part of this paper we list the empirically derived les- 
sons learned (Section II-A) in the form of software engi- 
neering principles (Section II-A-1), measurement princi- 
ples (Section II-A-2), and motivate the TAME project by 
stating several implications derived from those principles 
(Section II-A-3). The TAME project (Section II-B) is pre- 
sented in terms of the improvement paradigm (Section 
II-B-1), the goal/question/metric paradigm as a mecha- 
nism for formalizing the improvement paradigm (Section 
II-B-2), and the TAME project model as an instantiation 
of both paradigms (Section II-B-3). In the second part of 
this™ paper we introduce the TAME system as ah approach 
to automatically supporting the TAME software engi- 
neering process model (Section III). The TAME system 
is presented in terms of its requirements (Section III- A) 
and architecture (Section III-B). In the third part of this 
paper, we introduce the first TAME prototype (Section 
IV) with respect to its functionality and our first experi- 
ences with it. 

II. Software Engineering Process 

Our experience from measuring and evaluating soft- 
ware engineering processes and products in a variety of 
project environments has been summarized in the form of 
lessons learned (Section II-A). Based upon this experi- 
ence the TAME project has produced an improvement- 
oriented process model (Section II-B). 

A. Lessons Learned fromPast Experience 

We have formulated our experience as a set of software 
engineering principles (Section II-A- 1) and measurement 
principles (Section II-A-2). Based upon these principles a 
number of implications for sound software engineering 
process models have been derived (Section II-A-3). 

1) Software Engineering Principles: The first five 
software engineering principles address the need for de- 
veloping quality a priori by introducing engineering dis- 
cipline into the field of software engineering: 

(PI) We need to clearly distinguish between the role of 
constructive and analytic activities. Only improved con- 
struction processes will result in higher quality software. 
Quality cannot be tested or inspected into software. An- 
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alytic processes (e.g., quality assurance) cannot serve as 
a substitute for constructive processes but will provide 
control of the constructive processes [27], [37], [61]. 

(P2) We need to formalize the planning of the con- 
struction process in order to develop quality a priori [3], 
[16], [19], [25]. Without such plans the trial and error 
approach can hardly be avoided. 

(P3) We need to formalize the analysis and improve- 
ment of construction processes and products in order to 
guarantee an organized approach to software engineering 
[3], [25]. 

(P4) Engineering methods require analysis to deter- 
mine whether they are being performed appropriately, if 
at all. This is especially important because most of these 
methods are heuristic rather than formal [42], [49]. [66]. 

(P5) Software engineers and managers need real-time 
feedback in order to improve the construction processes 
and products of the ongoing project. The organization 
needs post-mortem feedback in order to improve the con- 
struction processes and products for future projects [66]. 

The remaining five software engineering principles ad- 
dress the need for tailoring of planning and analysis pro- 
cesses due to changing needs form project to project and 
environment to environment; 

(P6) All project environments and products are differ- 
ent in some way [2], [66]. These differences must be made 
explicit and taken into account in the software execution 
processes and in the product quality goals [3], [16], [19], 
[25], 

(P7) There are many execution models for software en- 
gineering. Each execution model needs to be tailored to 
the organization and project needs and characteristics [2], 
[13], [16], [66]. 

(P8) We need to formalize the tailoring of processes 
toward the quality and productivity goals of the project 
and the characteristics of the project environment and the 
organization [16]. It is not easy to apply abstractly defined 
methods to specific environments. 

(P9) This need for tailoring does not mean starting from 
scratch each time. We need to reuse experience, but only 
after tailoring it to the project [ 1 1, ffll'161 [7], [18], [32]. 

(P10) Because of the constant need for tailoring, man- 
agement control is crucial and must be flexible. Manage- 
ment needs must be supported in this software engineer- 
ing process. 

A more detailed discussion of these software engineer- 
ing principles is contained in [17]. 

2) Software Measurement Principles: The first four 
measurement principles address the purpose of the mea- 
surement process, i.e., why should we measure, what 
should we measure, for whom should we measure: 

(Ml) Measurement is an ideal mechanism for charac- 
terizing, evaluating, predicting, and providing motivation 
for the various aspects of software construction processes 
and products [3], [4], [9], [16], [2IJ, [25], [48], [56], 
[57], It is a common mechanism for relating these multi- 
ple aspects. 

(M2) Measurements must be taken on both the soft- 
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ware processes and the various software products [1], [5], 
[14], [29], [38], [40], [42]-[44], [47], [54]-[56], [65], 
[66]. Improving a product requires understanding both the 
product and its construction processes. 

(M3) There are a variety of uses for measurement. The 
purpose of measurement should be clearly stated. We can 
use measurement to examine co*v effectiveness, reliabil- 
ity, correctness, maintainability, efficiency, user friendli- 
ness, etc. [8] -[ 10], [13], [14], [16], [20], [23], [25], [41], 
[53], [57], [61]. 

(M4) Measurement needs to be viewed from the appro- 
priate perspective. The corporation, the manager, the de- 
veloper, the customer’s organization and the user each 
view the product and the process from different perspec- 
tives. Thus they may want to know different things about 
the project and to different levels of detail [3], [16], [19], 
[25], [66]. 

The remaining ten measurement principles address met- 
rics and the overall measurement process. The first two 
principles address characteristics of metrics (i.e., what 
kinds of metrics, how many are needed), while the latter 
eight address characteristics of the measurement process 
(i.e., what should the measurement process look like, how 
do we support characterization, planning, construction, 
and learning and feedback): 

(M5) Subjective as well as objective metrics are re- 
quired. Many process, product and environment aspects 
can be characterized by objective metrics (e.g., product 
complexity, number of defects or effort related to pro- 
cesses). Other aspects cannot be characterized objectively 
yet (e.g., experience of personnel, type of application, 
understandability of processes and products); but they can 
at least be categorized on a quantitative (nominal) scale 
to a reasonable degree of accuracy [4], [5], [16], [48], 
[56]. 

(M6) Most aspects of software processes and products 
are too complicated to be captured by a single metric. For 
both definition and interpretation purposes, a set of met- 
rics (a metric vector) that frame the purpose for measure- 
ment needs to be defined [9]. 

(M7) The development and maintenance environments 
must be prepared for measurement and analysis. Planning 
is required and needs to be carefully integrated into the 
overall software engineering process mode!. This plan- 
ning process must take into account the experimental de- 
sign appropriate for the situation [3], [14], [19], [22], 
[ 66 ]. 

(M8) We cannot just use models and metrics from other 
environments as defined. Because of the differences 
among execution models (principle P7), the models and 
metrics must be tailored for the environment in which they 
will be applied and checked for validity in that environ- 
ment [2], [6]-[8], [12], [23], [31], [40], [47], [50], [51], 
[ 62 ]. 

(M9) The measurement process must be top-down 
rather than bottom-up in order to define a set of opera- 
tional goals, specify the appropriate metrics, permit valid 


contextual interpretation and analysis, and provide feed- 
back for tailorability and tractability [3], [16], [19], [25]. 

(M10) For each environment there exists a character- 
istic set of metrics that provides the needed information 
for definition and interpretation purposes [21]. 

(Mil) Multiple mechanisms are needed for data col- 
lection and validation. The nature of the data to be col- 
lected (principle M5) determines the appropriate mecha- 
nisms [4], [25], [48], e.g., manually via forms or 
interviews, or automatically via analyzers. 

(M12) In order to evaluate and compare projects and 
to develop models we need a historical experience base. 
This experience base should characterize the local envi- 
ronment [4], [13], [25], [34], [44], [48]. 

(M13) Metrics must be associated with interpretations, 
but these interpretations must be given in context [3], [16], 
[19], [25], [34], [56]. 

(M14) The experience base should evolve from a da- 
tabase into a knowledge base (supported by an expert sys- 
tem) to formalize the reuse of experience [11], [14]. 

A more detailed discussion of these measurement prin- 
ciples is contained in [17]. 

3) Implications: Clearly this set of principles is not 
complete. However, these principles provide empirically 
derived insight into the limitations of traditional process 
models. We will give some of the implications of these 
principles with respect to the components that need to be 
included in software process models, essential character- 
istics of these components, the interaction of these com- 
ponents, and the needed automated support. Although 
there is a relationship between almost all principles and 
the derived implications, we have referenced for each im- 
plication only those principles that are related most di- 
rectly. 

Based upon our set of principles it is clear that we need 
to better understand the software construction process and 
product (e.g., principles PI, P4, P6, M2, M5, M6, M8, 
M9, M10, M12). Such an understanding will allow us to 
plan what we need to do and improve over our current 
practices (e.g., principles PI, P2, P3, P7, P8, M3, M4, 
M7, M9, Ml 4). To make those plans operational , we 
need to specify how we are going to affect the construc- 
tion processes and their analysis (e.g., principles PI, P2, 
P3, P4, P7, P8, M7, M8, M9, M14). The execution of 
these prescribed plans involves the construction of prod- 
ucts and the analysis of the constructive processes and 
resulting products (e.g., principles PI, P7). 

All these implications need to be integrated in such a 
way that they allow for sound learning and feedback so 
that we can improve the software execution processes and 
products (e.g., principles^ 1, P3, P4, P5, P9, P10, M3, 
M4, M9, M12, M13, M14*). This interaction requires the 
integration of the constructive and analytic aspects of the 
software engineering process model (e.g., principles P2, 
M7, M9). 

The components and their interactions need to be for- 
malized so they can be supported properly by an ISEE 
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(e.g., principles P2, P3, P8 t P9, M9). This formalization 
must include a structuring of the body of experience so 
that characterization, planning, learning, feedback, and 
improvement can take place (e.g., principles P2, P3, P8, 
P9 t M9). An ideal mechanism for supporting all of these 
components and their interactions is quantitative analysis 
(e.g., principles P3, P4, Ml, M2, M5, M6, M8, M9, 
M10, Mil, Mi3). ^ 

B. A Process Model: The TAME Project 

The TAME (Tailoring A Measurement Environment) 
project at the University of Maryland has produced a soft- 
ware engineering process model (Section II-B-3) based 
upon our empirically derived lessons learned. This soft- 
ware engineering process model is based upon the im- 
provement (Section II-B-1) and goal/question/metric par- 
adigms (Section II-B-2). 

1) Improvement Paradigm : The improvement para- 
digm for software engineering processes reflects the im- 
plications stated in Section II-A-3. It consists of six major 
steps [3]: 

(11) Characterize the current project environment. 

(12) Set up goals and refine them into quantifiable ques- 
tions and metrics for successful project performance and 
improvement over previous project performances. 

(13) Choose the appropriate software project execution 
mode! for this project and supporting methods and tools. 

(14) Execute the chosen processes and construct the 
products, collect the prescribed data, validate it, and pro- 
vide feedback in real-time. 

(15) Analyze the data to evaluate the current practices, 
determine problems, record the findings, and make rec- 
ommendations for improvement. 

(16) Proceed to Step II to start the next project, armed 
with the experience gained from this and previous proj- 
ects. 

This paradigm is aimed at providing a basis for corpo- 
rate learning and improvement. Improvement is only pos- 
sible if we a) understand what the current status of our 
environment is (step II), b) state precise improvement 
goals for the particular project and quantify them for the 
purpose of control (step 12), c) choose the appropriate 
process execution models, methods, and tools in order to 
achieve these improvement goals (step 13), execute and 
monitor the project performance thoroughly (step 14), and 
assess it (step 15). Based upon the assessment results we 
can provide feedback into the ongoing project or into the 
planning step of future projects (steps 15 and 16). 

2) Goal/Question/Metric Paradigm : The goal/ques- 
tion/metric (GQM) paradigm is intended as a mechanism 
for formalizing the characterization, planning, construc- 
tion, analysis, learning and feedback tasks. It represents 
a systematic approach for setting project goals (tailored 
to the specific needs of an organization) and defining them 
in an operational and tractable way. Goals are refined into 
a set of quantifiable questions that specify metrics. This 
paradigm also supports the analysis and integration of 


metrics in the context of the questions and the original 
goal. Feedback and learning are then performed in the 
context of the GQM paradigm. 

The process of setting goals and refining them into 
quantifiable questions is complex and requires experi- 
ence. In order to support this process, a set of templates 
for setting goals, and a set of guidelines for deriving ques- 
tions and metrics has been developed. These templates 
and guidelines reflect our experience from having applied 
the GQM paradigm in a variety of environments (e.g,, 
NASA [4], [17], [48], IBM [60], AT&T, Burroughs [56], 
and Motorola). We received additional feedback from 
Hewlett Packard where the GQM paradigm has been used 
without our direct assistance [39]. It needs to be stressed 
that we do not claim that these templates and guidelines 
are complete; they will most likely change over time as 
our experience grows. Goals are defined in terms of pur- 
pose, perspective and environment. Different sets of 
guidelines exist for defining product -related and process- 
related questions. Product- related questions are formu- 
lated for the purpose of defining the product (e.g., phys- 
ical attributes, cost, changes, and defects, context), de- 
fining the quality perspective of interest (e.g., reliability, 
user friendliness), and providing feedback from the par- 
ticular quality perspective. Process-related questions are 
formulated for the purpose ofdefining die process (quality 
of use, domain of use), defining the quality perspective 
of interest (e.g., reduction of defects, cost effectiveness 
of use), and providing feedback from the particular qual- 
ity perspective. 

• Templates/Guidelines for Goal Definition: 

Purpose: To (characterize, evaluate, predict, moti- 
vate, etc.) the (process, product, model, metric, etc.) in 
order to (understand, assess, manage, engineer, learn, 
improve, etc.) it. 

Example: To evaluate the system testing methodology 
in order to improve it. 

Perspective: Examine the (cost, effectiveness, cor- 
rectness, defects, changes, product metrics, reliability, 
etc.) from the point of view of the (developer, manager, 
customer, corporate perspective, etc.) 

Example: Examine the effectiveness From the devel- 
oper’s point of view. 

Environment: The environment consists of the fol- 
lowing: process factors, people factors, problem factors, 
methods, tools, constraints, etc. 

Example: The product is an operating system that must 
fit on a PC, etc. 

• Guidelines for Product-Related Questions: 

For each product under study there are three major 
subgoals that need to be addressed: 1) definition of the 
product, 2) definition of the quality perspectives of inter- 
est, and 3) feedback related to the quality perspectives of 
interest. 

Definition of the product includes questions related to 
physical attributes (a quantitative characterization of the 
product in terms of physical attributes such as size, com- 
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plexity, etc.), cost (a quantitative characterization of the 
resources expended related to this product in terms of ef- 
fort, computer time, etc.), changes and defects (a quan- 
titative characterization of the errors, faults, failures, ad- 
aptations, and enhancements related to this product), and 
context (a quantitative characterization of the customer 
community using this product and their operational pro- 
files). 

Quality perspectives of interest includes, for each 
quality perspective of interest (e.g. , reliability, user friend- 
liness), questions related to the major model(s) used (a 
quantitative specification of the quality perspective of in- 
terest), the validity of the model for the particular envi- 
ronment (an analysis of the appropriateness of the model 
for the particular project environment), the validity of the 
data collected (an analysis of the quality of data), the 
model effectiveness (a quantitative characterization of the 
quality of the results produced according to this model), 
and a substantiation of the model (a discussion of whether 
the results are reasonable from various perspectives). 

Feedback includes questions related to improving the 
product relative to the quality perspective of interest (a 
quantitative characterization of the product quality, major 
problems regarding the quality perspective of interest, and 
suggestions for improvement during the ongoing project 
as well as during future projects). 

• Guidelines for Process-Related Questions 

For each process under study, there are three major 
subgoals that need to be addressed: 1) definition of the 
process, 2) definition of the quality perspectives of inter- 
est, and 3) feedback from using this process relative to 
the quality perspective of interest. 

Definition of the process includes questions related to 
the quality of use (a quantitative characterization of the 
process and an assessment of how well it is performed), 
and the domain of use (a quantitative characterization of 
the object to which the process is applied and an analysis 
of the process performer's knowledge concering this ob- 
ject). 

Quality perspectives of interest follows a pattern sim- 
ilar to the corresponding product-oriented subgoal includ- 
ing, for each quality perspective of interest (e.g., reduc- 
tion of defects, cost effectiveness), questions related to 
the major model (s) used , and validity of the model for the 
particular environment , the validity of the data collected , 
the model effectiveness and the substantiation of the 
model ) . 

Feedback follows a pattern similar to the correspond- 
ing product-oriented subgoal. 

• Guidelines for Metrics, Data Collection, and 
Interpretation: 

-The choice of metrics is determined by the quantifiable 
questions. The guidelines for questions acknowledge the 
need for generally more than one metric (principle M6), 
for objective and subjective metrics (principle M5), and 
for associating interpretations with metrics (principle 
M13). The actual GQM models generated from these tem- 


plates and guidelines will differ from project to project 
and organization to organization (principle M6). This re- 
flects their being tailored for the different needs in differ- 
ent projects and organizations (principle M4). Depending 
on the type of each metric, we choose the appropriate me- 
chansims for data collection and validation (principle 
Mil). As goals, questions and metrics provide for tract- 
ability of the (top-down) definitional quantification pro- 
cess, they also provide for the interpretation context (bot- 
tom-up). This integration of definition with interpretation 
allows for the interpretation process to be tailored to the 
specific needs of an environment (principle M8). 

3) Improvement-Oriented Process Model: The 
TAME software engineering process model is an instan- 
tiation of the improvement paradigm. The GQM para- 
digm provides the necessary integration of the individual 
components of this model. The TAME software engi- 
neering process model explicitly includes components for 
(Cl) the characterization of the current status of a project 
environment, (C2) the planning for improvement inte- 
grated into the execution of projects, (C3) the execution 
of the construction and analysis of projects according to 
the project plans, and (C4) the recording of experience 
into an experience base. The learning and feedback mech- 
anism (C5) is distributed throughout the model within and 
across the components as information flows frcrm one 
component to another. Each of these tasks must be dealt 
with from a constructive and analytic perspective. Fig. 1 
contains a graphical representation of the improvement- 
oriented TAME process model. The relationships (arcs) 
among process model components in Fig. 1 represent in- 
formation flow. 

(Cl) Characterization of the current environment is re- 
quired to understand the various factors that influence the 
current project environment. This task is important in or- 
der to define a starting point for improvement. Without 
knowing where we are, we will not be able to judge 
whether we are improving in our present project. We dis- 
tinguish between the constructive and analytic aspects of 
the characterization task to emphasize that we not only 
state the environmental factors but analyze them to the de- 
gree possible based upon data and other forms of infor- 
mation from prior projects. This characterization task 
needs to be formalized. 

(C2) Planning is required to understand the project 
goals, execution needs, and project focus for learning and 
feedback. This task is essential for disciplined software 
project execution (i.e., executing projects according to 
precise specifications of processes and products). It pro- 
vides the basis for improvement relative to the current sta- 
tus determined during characterization. In the planning 
task, we distinguish between the constructive and analytic 
as well as the “what” and “how” aspects of planning. 
Based upon the GQM paradigm all these aspects are highly 
interdependent and performed as a single task. The de- 
velopment of quantitatively analyzable goals is an itera- 
tive process. However, we formulate the four planning as- 
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Fig. I. The improvement-oriemed TAME software process model. 


pects as four separate components to emphasize the 
differences between creating plans for development and 
making those plans analyzabie, as well as between stating 
what it is you want to accomplish and stating how you 
plan to tailor the processes and metrics to do it. 

(C2.1) “What" Planning deals with choosing, as- 
signing priorities, and operationally defining, to the de- 
gree possible, the project goals from the constructive and 
analytic perspectives. The actual goal setting is an instan- 
tiation of the front-end of the GQM paradigm (the tem- 
plates/guidelines for goal definition). The constructive 
perspective addresses the definition of project goals such 
as on-time delivery, the appropriate functionality to sat- 
isfy the user, and the analysis of the execution processes 
we are applying. Some of these goals might be stated as 
improvement goals over the current state-of-the-practice 
as characterized in component Cl . These goals should be 
prioritized and operationally defined to the extent possible 
without having chosen the particular construction models, 
methods and tools yet. The analytic perspective addresses 
analysis procedures for monitoring and controlling 
whether the goals are met. This analytic goal perspective 
should prescribe the necessary learning and feedback 
paths. It should be operationally defined to the extent al- 
lowed by the degree of precision of the constructive goal 
perspective. 

(C2.2) “How” Planning is based upon the results 
from the “what" planning (providing for the purpose and 
perspective of a goal definition according to the GQM 
paradigm front-end) and the characterization of the envi- 
ronment (providing for the environment part of a goal def- 
inition according to the GQM paradigm front-end). The 
“how” planning involves the choice of an appropriately 
tailored execution model, methods and tools that permit 
the building of the system in such a way that we can ana- 
lyze whether we are achieving our stated goals. The par- 
ticular choice of construction processes, methods and tools 


(component C2.2.1) goes hand in hand with fine-tuning 
the analysis procedures derived during the analytic per- 
spective of the “what” planning (component C2.2.2). 

(C2.2.I) Planning for construction includes choos- 
ing the appropriate execution model, methods and tools 
to fulfill the project goals. Tt should be dear that effective 
planning for construction depends on well-defined project 
goals from both the constructive and analytic perspective 
(component C2. 1). 

(C2. 2. 2) Planning for analysis addresses the fine- 
tuning of the operational definition of the analytic goal 
perspective (derived as part of component C2. 1) towards 
the specific choices made during planning for construc- 
tion (C2.2.1). The actual planning for analysis is an in- 
stantiation of the back-end of the GQM paradigm; details 
need to be filled in (e.g., quantifiable questions, metrics) 
based upon the specific methods and tools chosen. 

(C3) Execution must integrate the construction (com- 
ponent C3.I) with the analysis (component C3.2). Anal- 
ysis (including measurement) cannot be an add-on but 
must be part of the execution process and drive the con- 
struction. The execution plans derived during the plan- 
ning task are supposed to provide for the required inte- 
gration of construction and analysis. 

(C4) The Experience Base includes the entire body of 
experience that is actively available to the project. We can 
characterize this experience according to the following di- 
mensions: a) the degree of precision/detail, and b) the de- 
gree to which it is tailored to meet the specific needs of 
the project (context). The precision/detail dimension in- 
volves the level of detail of the experimental design and 
the level and quality of data collected. On one end of the 
spectrum we have detailed objective quantitative daia chat 
allows us to build mathematically tractable models. On 
the other end of the spectrum we have interviews and 
qualitative information that provide guidelines and “les- 
sons learned documents”, and permit the better formu- 
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lation of goals and questions. The level of precision and 
detail affects our level of confidence in the results of the 
experiment as well as the cost of the data collection pro- 
cess. Clearly priorities play an important role here. The 
context dimension involves whether the focus is to leam 
about the specific project, projects within a specific ap- 
plication domain or general truths about the software pro- 
cess or product (requires the incorporation of formalized 
experience from prior projects into the experience base). 
Movement across the context dimension assumes an abil- 
ity to generalize experience to a broader context than the 
one studied, or to tailor experience to a specific project. 
The better this experience is packaged, the better our un- 
derstanding of the environment. Maintaining a body of 
experience acquired during a number of projects is one of 
the prerequisites for learning and feedback across envi- 
ronments. 

(C5) Learning and Feedback are integrated into the 
TAME process model in various ways. They are based 
upon the experimental model for learning consisting of a 
set of steps, starting with focused objectives, which are 
turned into specific hypotheses, followed by running ex- 
periments to validate the hypotheses in the appropriate en- 
vironment. The model is iterative; as we leam from ex- 
perimentation, we are better able to state our focused 
objectives and we change and refine our hypotheses. 

This model of learning is incorporated into the GQM 
paradigm where the focused objectives are expressed as 
goals, the hypotheses are expressed as questions written 
to the degree of formalism required, and the experimental 
environment is the project, a set of projects in the same 
domain, or a corporation representing a general environ- 
ment. Clearly the GQM paradigm is also iterative. 

The feedback process helps generate the goals to influ- 
ence one or more of the components in the process model, 
e.g., the characterization of the environment, or the anal- 
ysis of the construction processes or products. The level 
of confidence we have in feeding back the experience to 
a project or a corporate environment depends upon the 
precision/detail level of the experience base (component 
C4) and the generality of the experimental environment 
in which it was gathered. 

The learning and feedback process appears in the model 
as the integration of all the components and their inter- 
actions as they are driven by the improvement and GQM 
paradigms. The feedback process can be channeled to the 
various components of the current project and to the cor- 
porate experience base for use in future projects. 

Most traditional software engineering process models 
address only a subset of the individual components of this 
model; in many cases they cover just the constructive as- 
pects of characterization (component Cl), “how” plan- 
ning (component C2.2.1), and execution (component 
C3.I). More recently developed software engineering 
process models address the constructive aspect of execu- 
tion (component C3.1) in more sophisticated ways (e.g., 
new process models [24], [30], [49], combine various pro- 
cess dimensions such as technical, managerial, contrac- 


tual [36], or provide more flexibility as far as the use of 
methods and tools is concerned, for example via the au- 
tomated generation of tools [45], [63]), or they add meth- 
ods and tools for choosing the analytical processes, meth- 
ods, and tools (component C3.2.2) as well as actually 
performing analysis (component C3.2) [52], [59]. How- 
ever, all these process models have in common the lack 
of completely integrating all their individual components 
in a systematic way that would permit sound learning and 
feedback for the purpose of project control and improve- 
ment of corporate experience. 

III. Automated Support through ISEEs: The 
TAME System 

The goal of an Integrated Software Engineering Envi- 
ronment (ISEE) is to effectively support the improvement- 
oriented software engineering process model described in 
Section II-B-3. An ISEE must support all the model com- 
ponents (characterization, planning, execution, and the ex- 
perience base), all the local interactions between model 
components, the integration, and formalization of the 
GQM paradigm, and the necessary transitions between the 
context and prccision/detail dimension boundaries in the 
experience base. Supporting the transitions along the ex- 
perience base dimensions is needed in order to allow for 
sound learning and feedback as outlined in Section II-B-3 
(component C5). 

The TAME system will automate as many of the com- 
ponents, interactions between components and supporting 
mechanisms of the TAME process model as possible. The 
TAME system development activities will concentrate on 
all but the construction component (component C3. 1) with 
the eventual goal of interfacing with constructive SEEs. 
In this section we present the requirements and the initial 
architecture for the TAME system. 

A. Requirements 

The requirements for the TAME system can be derived 
from Section II-B-3 in a natural way. These requirements 
can be divided into external requirements (defined by and 
of obvious interest to the TAME system user) and internal 
requirements (defined by the TAME design team and re- 
quired to support the external requirements properly). 

The first five (external) requirements include support 
for the characterization and planning components of the 
TAME model by automating an instantiation of the GQM 
paradigm, for the analysis component by automating data 
collection, data validation and analysis, and the learning 
and feedback component by automating interpretation and 
organizational learning. We will list for each external 
TAME system requirement the TAME process model 
components of Section H-B-3 from which it has been de- 
rived. 

External TAME requirements: 

(Rl) A mechanism for defining the constructive and 
analytic aspects of project goals in an operational and 
quantifiable way (derived from components Cl, C2.1, 
C2.2.2, C3.2). 

We use the GQM paradigm and its templates for defin- 
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ing goals operationally and refining them into quantifiable 
questions and metrics. The selection of the appropriate 
GQM model and its tailoring needs to be supported. The 
user will either select an existing model or generate a new 
one. A new model can be generated from scratch or by 
reusing pieces of existing models. The degree to which 
the selection, generation, q£d reuse tasks can be sup- 
ported automatically depends largely on the degree to 
which the GQM paradigm and its templates can be for- 
malized. The user needs to be supported in defining his / 
her specific goals according to the goal definition tem- 
plate. Based on each goal definition, the TAME system 
will search fora model in the experience base. If no ap- 
propriate model exists, the user will be guided in devel- 
oping one. Based on the tractability of goals into subgoals 
and questions the TAME system will identify reusable 
pieces of existing models and compose as much of an ini- 
tial model as possible. This initial model will be com- 
pleted with user interaction. For example, if a user wants 
to develop a model for assessing a system test method 
used in a particular environment, the system might com- 
pose an initial model by reusing pieces from a model as- 
sessing a different test method in the same environment, 
and from a model for assessing the same system test 
method in a different environment. A complete GQM 
model includes rules for interpretation of metrics" and 
guidelines for collecting the prescribed data. The TAME 
system will automatically generate as much of this infor- 
mation as possible. 

(R2) The automatic and manual collection of data and 
the validation of manually collected data (derived from 
component C3.2). 

The collection of all product-related data (e.g., lines of 
code, complexity) and certain process-related data (e.g., 
number of compiler runs, number of test runs) will be 
completely automated. Automation requires an interface 
with construction-oriented SEEs. The collection of many 
process-related data (e.g., effort, changes-) and subjective 
data (e.g., experience of personnel, characteristics of 
methods used) cannot be automated. The schedule ac- 
cording to which measurement tools are run needs to be 
defined as part of the planning activity. It is possible to 
collect data whenever they are needed, periodically (e.g., 
always at a particular time of the day), or whenever 
changes of products occur (e.g., whenever a new product 
version is entered into the experience base all the related 
metrics are recomputed). All manually collected data need 
to be validated. Validating whether data are within their 
defined range, whether all the prescribed data are col- 
lected, and whether certain integrity rules among data are 
maintained will be automated. Some of the measurement 
tools will be developed as part of the TAME system de- 
velopment project, others will be imported. The need for 
importing measurement tools will require an effective in- 
terconnection mechanism (probably an interconnection 
language) for integrating tools developed in different lan- 
guages. 

(R3) A mechanism for controlling measurement and 
analysis (derived from component C3.2). 
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A GQM model is used to specify and control the exe- 
cution of a particular analysis and feedback session. Ac- 
cording to each GQM model, the TAME system must 
trigger the execution of measurement tools for data col- 
lection, the computation of all metrics and distributions 
prescribed, and the application of statistical procedures. 
If certain metrics or distributions cannot be computed tfue 
to the lack of data or measurement tools, the TAME sys- 
tem must inform the user. 

(R4) A mechanism for interpreting analysis results in a 
context and providing feedback for the improvement of 
the execution model, methods and tools (derived from 
components C3.2, C.5). 

We use a GQM model to define the rules and context 
for interpretation of data and for feedback in order to re- 
fine and improve execution models, methods and tools. 
The degree to which interpretation can be supported de- 
pends on our understanding of the software process and 
product, and the degree to which we express this under- 
standing as formal rules. Today, interpretation rules exist 
only for some of the aspects of interest and are only valid 
within a particular project environment or organization. 
However, interpretation guided by GQM models will en- 
able an evolutionary learning process resulting in better 
rules for interpretation in the future. The interpretation 
process can be much more effective provided historical 
experience is available allowing for the generation of his- 
torical baselines. In this case we can at least identify 
whether observations made during the current project de- 
viate from past experience or not. 

(R5) A mechanism for learning in an organization (de- 
rived from components C4, C5). 

The learning process is supported by iterating the se- 
quence of defining focused goals, refining them into hy- 
potheses, and running experiments. These experiments 
can range from completely controlled experiments to reg- 
ular project executions. In each case we apply measure- 
ment and analysis procedures to project classes of inter- 
est. For each of those classes, a historical experience base 
needs to be established concerning the effectiveness of the 
candidate execution models, methods and tools. Feed- 
back from ongoing projects of the same class, the corre- 
sponding execution models, methods and tools can be re- 
fined and improved with respect to context and precision/ 
detail so that we increase our potential to improve future 
projects, — ----- - 

The remaining seven (internal) requirements deal with 
user interface management, report generation, experience 
base, security and access control, configuration manage- 
ment control, SEE interface and distribution issues. All 
these issues are important in order to support planning, 
construction, learning and feedback effectively. 

Internal TAME requirements ; 

(R6) A homogeneous user interface. 

We distinguish between the physical and logical user 
interface. The physical user interface provides a menu or 
command driven interface between the user and the 
TAME system. Graphics and window mechansims will be 
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incorporated whenever useful and possible. The logical 
user interface reflects the user’s view of measurement and 
analysis. Users will not be allowed to directly access data 
or run measurement tools. The only way of working w ith 
the TAME system is via a GQM model. TAME will en- 
force this top-down approach to measurement via its log- 
ical user interface. The acceptance of this kind of user 
interface will depend on the effectiveness and ease with 
which it can be used. Homogeneity is important for both 
the physical and logical user interface. 

(R7) An effective mechanism for presenting data, in- 
formation. and knowledge. 

The presentation of analysis (measurement and inter- 
pretation) results via terminal or primer/plotter needs to 
be supported. Reports need to be generated for different 
purposes. Project managers will be interested in periodi- 
cal reports reflecting the current status of their project. 
High level managers will be interested in reports indicat- 
ing quality and productivity trends of the organization. 
The specific interest of each person needs to be defined 
by one or more GQM models upon which automatic re- 
port generation can be based. A laser printer and multi- 
color plotter would allow the appropriate documentation 
of tables, histograms, and other kinds of textual and 
graphical representations. 

(R8) The effective storage and retrieval of all relevant 
data, information, and knowledge in an experience base. 

All data, information, and knowledge required to sup- 
port tailorability and tractability need to be stored in an 
experience base. Such an experience base needs to store 
GQM models, engineering products and measurement 
data. It needs to store data derived from the current proj- 
ect as well as historical data from prior projects. The ef- 
fectiveness of such an experience base will be improved 
for the purpose of learning and feedback if, in addition to 
measurement data, interpretations from various analysis 
sessions are stored. In the future, the interpretation rules 
themselves will become integral part of such an experi- 
ence base. The experience base should be implemented as 
an abstract data type, accessible through a set of functions 
and hiding the actual implementation. This latter require- 
ment is especially important due to the fact that current 
database technology is not suited to properly support soft- 
ware engineering concepts [26], The implementation of 
the experience base as an abstract data type allows us to 
use currently available database technology and substitute 
more appropriate technology later as it becomes avail- 
able. The ideal database would be self-adapting to the 
changing needs of a project environment or an organiza- 
tion. This would require a specification language for soft- 
ware processes and products, and the ability to generate 
database schemata from specifications written in such a 
language [46]. 

(R9) Mechanisms allowing for the implementation of 
a variety of access control and security strategies. 

TAME must control the access of users to the TAME 
system itself, to various system functions and to the ex- 
perience base. These are typical functions of a security 
system. The enforced security strategies depend on the 


project organization. It is part of planning a project to 
decide who needs to have access to what functions and 
pieces of data, information, and knowledge. In addition 
to these security functions, more sophisticated data access 
control functions need to be performed. The data access 
system is expected to ’‘recommend" to a user who is de- 
veloping a GQM model the kinds of data that might be 
helpful in answering a particular question and support the 
process of choosing among similar data based on avail- 
ability or other criteria. 

(RIO) Mechanisms allowing for the implementation of 
a variety of configuration management and control strat- 
egies. 

In the context of the TAME system we need to manage 
and control three-dimensional configurations. There is 
first the traditional product dimension making sure that 
the various product and document versions are consistent. 
In addition, each product version needs to be consistent 
with its related measurement data and the GQM model 
that guided those measurements. TAME must ensure that 
a user always knows whether data in the experience base 
is consistent with the current product version and was col- 
lected and interpolated according to a particular model. 
The actual configuration management and control strate- 
gies will result from the project planning activity. 

(R1 1) An interface to a construction-oriented SEE. 

An interface between the TAME system (which auto- 
mates all process model components except for the con- 
struction component C3.1 of the TAME process model) 
and some external SEE (which automates the construction 
component) is necessary for three reasons: a) to enable 
the TAME system to collect data (e.g., the number of 
activations of a compiler, the number of test runs) directly 
from the actual construction process, b) to enable the 
TAME system to feed analysis results back into the on- 
going construction process, and c) to enable the construc- 
tion-oriented SEE to store/retrieve products into/from the 
experience base of the TAME system. Models for appro- 
priate interaction between constructive and analytic pro- 
cesses need to be specified. Interfacing with construction- 
oriented SEE’s poses the problem of efficiently intercon- 
necting systems implemented in different languages and 
running on different machines (probably with different op- 
erating systems). 

(R12) A structure suitable for distribution. 

TAME will ultimately run on a distributed system con- 
sisting of at least one mainframe computer and a number 
of workstations. The mainframes are required to host the 
experience base which can be assumed to be very large. 
The rest of TAME might be replicated on a number of 
workstations. rT 

B. Architecture 

Fig. 2 describes our current view of the TAME archi- 
tecture in terms of individual architectural components and 
their control flow interrelationships. The first prototype 
described in Section IV concentrates on the shaded com- 
ponents of Fig. 2. 

We group the TAME components into five logical lev- 
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Fig. 2. The architectural design of the TAME system. 


els, the physical user interface, logical user interface, 
analysis and feedback, measurement and support level. 
Each of these five levels consists of one or more architec- 
tural components: 

• The Physical User Interface Level consists of one 
component: 

(Al) The User Interface Management component 
implements the physical user interface requirement R6. It 
provides a choice of menu or command driven access and 
supports a window-oriented screen layout. 

• The Logical (GQM-Oriented) User Interface Level 
consists of two components: 

(A2) The GQM Model Selection component imple- 
ments the homogeneity requirement of the logical user in- 
terface (R6). It guarantees that no access to the analysis 
and feedback, measurement, or support level is possible 
without stating the purpose for access in terms of a spe- 
cific GQM model. 

(A3) The GQM Model Generation component imple- 
ments requirement R! regarding the operational and 
quantifiable definition of GQM models either from scratch 
or by modifying existing models. 

• The Analysis and Feedback Level consists of two 
components: 

(A4.1) This first portion of the Construction Inter- 
face component implements the feedback interface be- 
tween the TAME system and construction-oriented SEEs 
(part b) of requirement R1 1). 

(A5) The GQM Analysis and Feedback component 
implements requirement R3 regarding execution and con- 
trol of an analysis and feedback session, interpretation of 


the analysis results, and proper feedback. Ail these activ- 
ities are done in the context of a GQM model created by 
A3. The GQM Analysis and Feedback component needs 
to have access to the specific authorizations of the user in 
order to know which analysis functions this user can per- 
form. The GQM Analysis and Feedback component also 
provides analysis functions, forTxam pie, telling the user 
whether certain metrics can be computed based upon the 
data currently available in the experience base. This anal- 
ysis feature of the subsystem is used for setting and op- 
erationally defining goals, questions, and metrics, as well 
as actually performing analyses according to those previ- 
ously established goals, questions, and metrics. 

• The Measurement Level consists of three compo- 
nents! J - £ * 

(A4.2) This second portion of the Construction In- 
terface component implements the measurement interface 
between the TAME system and SEE's (part a) of require- 
ment R 1 1 ) and the SEE's access to the experience base of 
the TAME system (part c) of requirement R1 1). 

(A6) The Measurement Scheduling component im- 
plements requirement R2 regarding the definition (and ex- 
ecution) of automated data collection strategies. Such 
strategies for when to collect data via the measurement 
tools may range from collecting data whenever they are 
needed for an analysis and feedback session (on-line) to 
collecting them periodically during low-load times and 
storing them in the experience base (off-line). 

(A7) The Measurement Tools component imple- 
ments requirement R2 regarding automated data collec- 
tion. The component needs to be open-ended in order to 
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allow the inclusion ot new and different measurement tools 
as needed. 

• The Support Level consists of three components: 

(A8) The Report Generation component implements 
requirement R7 regarding the production of all kinds of 
reports. 

(A9) The Data Entry and Validation component im- 
plements requirement R2 regarding the entering of man- 
ually collected data and their validation. Validated data 
are stored in the experience base component. 

(A 10) The Experience Base component implements 
requirement R8 regarding the effective storage and re- 
trieval of all relevant data, information and knowledge. 
This includes all kinds of products, analytical data (e.g., 
measurement data, interpretations), and analysis plans 
(GQM models). This component provides the infrastruc- 
ture for the operation of all other components of the 
TAME process model and the necessary interactions 
among them. The experience base will also provide mech- 
anisms supporting the learning and feedback tasks. These 
mechanisms include the proper packaging of experience 
along the context and precision/detail dimensions. 

In addition, there exist two orthogonal components 
which for simplicity reasons are not reflected in Fig. 2: 

(All) The Data Access Control and Security com- 
ponent(s) implement requirement R9. There may exist a 
number of subcomponents distributed across the logical 
architectural levels. They will validate user access to the 
TAME system itself and to various functions at the user 
interface level. They will also control access to the proj- 
ect experience through both the measurement tools and 
the experience base. 

(A 12) The Configuration Management and Control 
component implements requirement RIO. This compo- 
nent can be viewed as part of the interface to the experi- 
ence base level. Data can only be entered into or retrieved 
from the experience base under configuration manage- 
ment control. 

IV. First TAME Prototye 

The first in a series of prototypes is currently being de- 
veloped for supporting measurement in Ada projects [ 15]. 
This first prototype will implement only a subset of the 
requirements stated in Section III-A because of a) yet un- 
solved problems that require research, b) solutions that 
require more formalization, and c) problems with inte- 
grating the individual architectural components into a 
consistent whole. Examples of unsolved problems requir- 
ing futher research are the appropriate packaging of the 
experience along the context and precision/detail dimen- 
sion and expert system support for interpretation pur- 
poses. Examples of solutions requiring more formaliza- 
tion are the GQM templates and the designing of a 
software engineering experience base. Examples of inte- 
gration problems are the embedding of feedback loops into 
the construction process, and the appropriate utilization 
of data access control and configuration management con- 


trol mechanisms. At this time, the prototype exists in 
pieces that have not been fully integrated together as well 
as partially implemented pieces. 

In this section, we discuss for each of the architectural 
components of this TAME prototype as many of the fol- 
lowing issues as are applicable: a) the particular approach 
chosen for the first prototype, b) experience with this ap- 
proach, c) the current and planned status of implementa- 
tion (automation) of the initial approach in the first TAME 
system prototype, and d) experiences with using the com- 
ponent: 

(Al) The User Interface Management component is 
supposed to provide the physical user interface for ac- 
cessing all TAME system functions, with the flexibility 
of choosing between menu and command driven modes 
and different window layouts. These issues are reasonably 
well understood by the SEE community. The first TAME 
prototype implementation will be menu-oriented and 
based upon the ‘X’ window mechanism. A primitive ver- 
sion is currently running. This component is currently not 
very high on our priority list. We expect to import a more 
sophisticated user interface management component at 
some later time or leave it completely to parties interested 
in productizing our prototype system. 

(A2) The GQM Model Selection component is sup- 
posed to force the TAME user to parameterize each 
TAME session by first stating the objective of the session 
in the form of an already existing GQM model or request- 
ing the creation of a new GQM model. The need for this 
restriction has been derived from the experience that data 
is frequently misused if it is accessible without a clear 
goal. The first prototype implementation does not enforce 
this requirement strictly. The current character of the first 
prototype as a research vehicle demands more flexibility. 
There is no question that this component needs to be im- 
plemented before the prototype leaves the research envi- 
ronment. 

(A3) The GQM Model Generation component is sup- 
posed to allow the creation of specific GQM models either 
from scratch or by modifying existing ones. We have pro- 
vided a set of templates and guidelines (Section II-B-2). 
We have been quite successful in the use of the templates 
and guidelines for defining goals, questions and metrics. 
There are a large number of organizations and environ- 
ments in which the model has been applied to specify what 
data must be collected to evaluate various aspects of the 
process and product, e.g., NASA/GSFC, Burroughs, 
AT&T, IBM, Motorola. The application of the GQM par- 
adigm at Hewlett Packard has shown that the templates 
can be used successfully without our guidance. Several of 
these experiences have bfen written up in the literature 
[4], [16], [17], [39], [48], [56], [60], [61]. We have been 
less successful in automating the process so that it ties 
into the experience base. As long as we know the goals 
and questions a priori , the appropriate data can be iso- 
lated and collected based upon the GQM paradigm. The 
first TAME prototype implementation is limited to sup- 
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port the generation of new models and the modificaton of 
existing models using an editor enforcing the templates 
and guidelines. We need to further formalize the tem- 
plates and guidelines and provide traceability between 
goals and questions. Formalization of the templates and 
providing traceability is our most important research is- 
sue. In the long run we might consider using artificial in- 
telligence planning techniques. 

(A4.1 and A4.2) The Construction Interface compo- 
nent is supposed to support all interactions between a SEE 
(which supports the construction component of the TAME 
process model) and the TAME system. The model in Fig. 
1 implies that interactions in both directions are required. 
We have gained experience in manually measuring the 
construction process by monitoring the execution of a va- 
riety of techniques (e.g., code reading [57], testing [20], 
and CLEANROOM development [61]) in various .envi- 
ronments including the SEL [4], [48]. We have also 
learned how analysis results can be fed back into the on- 
going construction process as well as into corporate ex- 
perience [4], [48]. Architectural component A4.1 is not 
part of this first TAME prototype. The first prototype im- 
plementation of A4.2 is limited to allowing for the inte- 
gration of (or access to) external product libraries. This 
minimal interface is needed to have access to the objects 
for measurement. No interface for the on-line measure- 
ment of ongoing construction processes is provided yet. 

(A5) The GQM Analysis and Feedback component is 
supposed to perform analysis according to a specific GQM 
model. We have gained a lot of experience in evaluating 
various kinds of experiments and case studies. We have 
been successful in collecting the appropriate data by trac- 
ing GQM models top-down. We have been less successful 
in providing formal interpretation rules allowing for the 
bottom-up interpretation of the collected data. One auto- 
mated approach to providing interpretation and feedback 
is through expert systems. ARROWSMITH-P provides 
interpretations of software project data to managers [44]; 
it has been tested in the SEL/NASA environment. The 
first prototype TAME implementation triggers the collec- 
tion of prescribed data (top-down) and presents it to the 
user for interpretation. The user-provided interpretations 
will be recorded (via a knowledge acquisition system) in 
order to accumulate the necessary knowledge that might 
lead us to identifying interpretation rules in the future. 

(A6) The Measurement Scheduling component is sup- 
posed to allow the TAME user to define a strategy for 
actually collecting data by running the measurement tools. 
Choosing the most appropriate of many possible strate- 
gies (requirements Section III- A) might depend on the re- 
sponse times expected from the TAME system or the stor- 
age capacity of the experience base. Our experience with 
this issue is limited because most of our analyses were 
human scheduled as needed [4], [48]. This component will 
not be implemented as part of the first prototype. In this 
prototype, the TAME user will trigger the execution of 
measurement activities explicitly (which can, of course, 


be viewed as a minimal implementation supporting a hu- 
man scheduling strategy). 

(A7) The Measurement Tools component is supposed 
to allow the collection of all kinds of relevant process and 
product data. We have been successful in generating tools 
to gather data automatically and have learned from the 
application of these tools in different environments. 
Within NASA, for example, we have used a coverage tool 
to analyze the impact of test plans on the consistency of 
acceptance test coverage with operational use coverage 
[53]. We have used a data bindings tool to analyze the 
structural consistency of implemented systems to their de- 
sign [41], and studied the relationship between faults and 
hierarchical structure as measured by the data bindings 
tool [60]. We have been able to characterize classes of 
products based upon their syntactic structure [35]. We 
have not, however, had much experience in automatically 
collecting process data. The first prototype TAME imple- 
mentation consists of measurement tools based on the 
above three. The first tool captures all kinds of basic Ada 
source code information such as lines of code and struc- 
tural complexity metrics [35], the second tool computes 
Ada data binding metrics, and the third tools captures dy- 
namic information such as test coverage metrics [65]. One 
lesson learned has been that the development of measure- 
ment tools for Ada is very often much more than just a 
reimplementation of similar tools for other languages. 
This is due to the very different Ada language concepts. 
Furth ermore, we ha ve recognized the importance of hav- 
ing an intermediate representation level allowing for a 
language independent representation of software product 
and process aspects. The advantage of such an approach 
will be that this intermediate representation needs to be 
generated only once per product or process. All the mea- 
surement tools can run on this intermediate representa- 
tion. This will not only make the actual measurement pro- 
cess less time-consuming but provide a basis for reusing 
the actual measurement tools to some extent across dif- 
ferent language environments. Only the tool generating 
the intermediate representation needs to be rebuilt for each 
new implementation language or TAME host enviroment. 

(A8) The Report Generator component is supposed to 
allow the TAME user to produce a variety of reports. The 
statistics and business communities have commonly ac- 
cepted approaches for presenting data and interpretations 
effectively (e.g., histograms). The first TAME prototype 
implementation does not provide a separate experience 
base reporting facility. Responsibility for reporting is at- 
tached to each individual prototype component; e.g., the 
GQM Model Generation component provides reports re- 
garding the models, each measurement tool reports on its 
own measurement data. 

(A9) The Data Entry and Validation component is sup- 
posed to allow the TAME user to enter all kinds of man- 
ually collected data and validate them. Because of the 
changing needs for measurement, this component must al- 
low for the definition of new (or modification of existing) 
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data collection forms as well as related validation (integ- 
rity) rules. If possible, the experience base should be ca- 
pable of adapting to new needs based upon new form def- 
initions. We have had lots of experience in designing 
forms and validations rules, using them, and learning 
about the complicated issues of deriving validation rules 
[4], [48]. The first prototype implementation will allow 
the TAME user to input off-line collected measurement 
data and validate them based upon a fixed and predefined 
set of data collection forms [currently in use in NASA’s 
Software Engineering Laboratory (SEL)]. This compo- 
nent is designed but not yet completely implemented. The 
practical use of the TAME prototype requires that this 
component provide the flexibility for defining and ac- 
cepting new form layouts. One research issue is identi- 
fying the easiest way to define data collection forms in 
terms of a grammar that could be used to generate the 
corresponding screen layout and experience base struc- 
ture. 

(A 10) The Experience Base component allows for ef- 
fective storage and retrieval of all relevant experience 
ranging from products and process plans (e.g., analysis 
plans in the form of GQM models) to measurement data 
and interpretations. The experience base needs to mirror 
the project environment. Here we are relying on the ex- 
perience of several faculty members of the database group 
at the University of Maryland. It has been recognized that 
current database technology is not sufficient, for several 
reasons, to truly mirror the needs of software engineering 
projects [26]. The first prototype TAME implementation 
is built on top of a relational database management sys- 
tem. A first database schema [46] modeling products as 
well as measurement data has been implemented. We are 
currently adding GQM models to the schema. The expe- 
riences with this first prototype show that the amount of 
experience stored and its degree of formalism (mostly 
data) is not yet sufficient. We need to better package that 
data in order to create pieces of information or knowl- 
edge. The GQM paradigm provides a specification of what 
data needs to be packaged. However, without more for- 
mal interpretation rules, the details of packaging cannot 
be formalized. In the long run, we might include expert 
system technology. We have also recognized the need for 
a number of built-in GQM models that can either be reused 
without modification or guide the TAME user during the 
process of creating new GQM models. 

(All) The Data Access Control and Security compo- 
nent is supposed to guarantee that only authorized users 
can access the TAME system and that each user can only 
access a predefined window of the experience base. The 
. first prototype implements this component only as far as 
user access to the entire system is concerned. 

(A 12) The Configuration Management and Control 
component is supposed to guarantee consistency between 
the objects of measurement (products and processes), the 
plans for measurement (GQM models), the data collected 
from the objects according to these plans, and the at- 


tached interpretations. This component will not be imple- 
mented in the first prototype. 

The integration of all these architectural components is 
incomplete. At this point in time we have integrated the 
first versions of the experience base, three measurement 
tools, a limited version of the GQM analysis and feedback 
component, the GQM generation component, and the user 
interface management component. Many of the UNIX® 
tools (e.g., editors, print facilities) have been integrated 
into the first prototype TAME system to compensate for 
yet missing components. This subset of the first prototype 
is running on a network of SUN-3’s under UNIX. It is 
implemented in Ada and C. 

This first prototype enables the user to generate GQM 
models using a structured editor. Existing models can be 
selected by using a unique model name. Support for se- 
lecting models based on goal definitions or for reusing 
existing models for the purpose of generating new models 
is offered, but the refinement of goals into questions and 
metrics relies on human intervention. Analysis and feed- 
back sessions can be run according to existing GQM 
models. Only minimal support for interpretation is pro- 
vided (e.g., histograms of data). Measurement data are 
presented to the user according to the underlying model 
for his/her interpretation. Results can be documented on 
a line printer. The initial set of measurement tools allows 
only the computation of a limited number of Ada-source- 
code-oriented static and dynamic metrics. Similar tools 
might be used in the case of Fortran source code [33]. 

V. Summary and Conclusions 

We have presented a set of software engineering and 
measurement principles which we have learned during a 
dozen years of analyzing software engineering processes 
and products. These principles have led us to recognize 
the need for software engineering process models that in- 
tegrate sound planning and analysis into the construction 
process. 

In order to achieve this integration the software engi- 
neering process needs to be tailorable and tractable. We 
need the ability to tailor the execution process, methods 
and tools to specific project needs in a way that permits 
maximum reuse of prior experience. We need to control 
the process and product because of the flexibility required 
in performing such a focused development. We also need 
as much automated support as possible. Thus an inte- 
grated software engineering environment needs to support 
all of these issues. 

In the TAME project we have developed an improve- 
ment-oriented (integrated) process model. It stresses a) 
the characterization of the current status of a project en- 
vironment, b) the planning for improvement integrated 
into software projects, and c) the execution of the project 
according to the prescribed project plans. Each of these 

•UNIX is a registered trademark of AT&T Bell Laboratories. 
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tasks must be dealt with from a constructive and analytic 
perspective. 

To integrate the constructive and analytic aspects of 
software development, we have used the GQM paradigm. 
It provides a mechanism for formalizing the characteriza- 
tion and planning tasks, controlling and improving proj- 
ects based on quantitative analysis, learning in a deeper 
and more systematic way about the software process and 
product, and feeding back the appropriate experience to 
current and future projects. 

The effectiveness of the TAME process model depends 
heavily on appropriate automated support by an ISEE. The 
TAME system is an instantiation of the TAME process 
model into an ISEE; it is aimed at supporting aTT aspects 
of characterization, planning, analysis, learning, and 
feedback according to the TAME process model. In ad- 
dition, it formalizes the feedback and learning mecha- 
nisms by supporting the synthesis of project experience, 
the formalization of its representation, and its tailoring 
towards specific project needs. It does this by supporting 
goal development into measurement via templates and 
guidelines, providing analysis of the development and 
maintenance processes, and creating and using experience 
bases (ranging from databases of historical data to knowl- 
edge bases that incorporate experience from prior proj- 
ects). 

We discussed a limited prototype of the TAME system, 
which has been developed as the first of a series of pro- 
totypes that will be built using an iterative enhancement 
model. The limitations of this prototype fall into two cat- 
egories, limitations of the technology and the need to bet- 
ter formalize the model so that it can be automated. 

The short range (1-3 years) goal for the TAME system 
is to build the analysis environment. The mid-range goal 
(3-5 years) is to integrate the system into one or more 
existing or future development or maintenance environ- 
ments. The long range goal (5-8 years) is to tailor those 
environments for specific organizations and projects. 

The TAME project is ambitious. It is assumed it will 
evolve over time and that we will leam a great deal from 
formalizing the various aspects of the TAME project as 
well as integrating the various paradigms. Research is 
needed in many areas before the idealized TAME system 
can be built. Major areas of study include measurement, 
databases, artificial intelligence, and systems. Specific 
activities needed to support TAME include: more for- 
malization of the GQM paradigm, the definition of better 
models for various quality and productivity aspects, 
mechanisms for better formalizing the reuse and tailoring 
of project experience, the interpretation of metrics with 
respect to goals, interconnection languages, language in- 
dependent representation of software, access control in 
general and security in particular, software engineering 
database definition, configuration management and con- 
trol, and distributed system architecture. We ait inter- 
ested in the role of further researching the ideas and prin- 
ciples of the TAME project. We will build a series of 


evolving prototypes of the system in order to leam and 
test out ideas. 
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Abstract 


This paper presents a conceptual model of 
software development resource data and validates 
the model by reference to the published literature 
on necessary resource data for development sup- 
port environments. The conceptual model 
presented here was developed using a top-down 
strategy. A resource data model is a prerequisite 
to the development of integrated project support 
environments which aim to assist in the processes 
of resource estimation, evaluation and control. 
The model proposed is a four dimensional view of 
resources which can be used for resource estima- 
tion, utilization, and review. The model is vali- 
dated by reference to three publications on 
resource databases, and the implications of the 
model arising out of these comparisons is dis- 
cussed. 


Keywords : software process, methods, tools, 
conceptual model, resources, estimation, environ- 
ments, software engineering database, validation 


INTRODUCTION 


This paper has two major aims: 

1) To briefly present a top-down characterization 
(TDC) structure of software project resource data, 
which aims to facilitate : 

1. Further accumulation of knowledge of pro- 
ject resource characteristics and metrics within 
a theoretical structure. 

2. The storage of project resource data in a 
generalized structured way so that estimation, 
evaluation, and control can be facilitated using 
an organized quantitative and qualitative data 
base. 

2) To validate this structure against published 
resource data models. 

The characterization structure of resource data is 
a prerequisite to the development of an Integrated 
Project Support Environment (IPSE) in which it 
is possible to: 

1. Objectively choose appropriate software 
processes. 

2. Estimate the process characteristics such aa 
time, cost, and quality 

3. Evaluate the extent to which the resource 
aims are being met during development, and 

4. Improve the software process and product. 


To date, the approach taken to the accumulation 
of knowledge concerning the software process has 
been largely bottom-up. Studies have been carried 
out to determine the existence and nature of pro- 
ject relationships. These studies, such as [Wolver- 
ton 74], [Nelson 671, [Chrysler 78], [Sackman et.al. 
68 , [Basili, Panlitio-Yap 85], [basili, Freburger 
81 , [Basili, Selby, Phillips 831, [Walston, Felix 
77 ), [Jeffery 87a, 87b], and [Jeffery, Lawrence 
1979, 1985] have explored the relationships 

between project variables, searching for an under- 
standing of the software process and product. For 
example, relationships between effort and size, 
errors and methods, and test strategy and bug 
identification, have been found. 


*This research was funded in part by NASA Grant 
NSG-5123 to University of Maryland 


The structure presented and validated here is a 
part of the TAME (Tailoring A Measurement 
Environment) project which seeks to develop an 
integrated software project measurement, 
analysis, and evaluation environment. This 
environment is based in part on the evolutionary 
improvement paradigm [discussed in Basili, Rom- 
bach 87]. It is also based on the "Goal-Question- 
Metric 11 paradigm outlined in [Basili 85] and 
[Basili, Weiss 84). 

The aims of this paper are firstly to present the 
TDC structure or model for the perception of 
software development resources which will assist 
in the process of taking those aims of, say, a 
development manager and translating them into a 
set of questions and metrics which can be used to 
measure the software process. It is meant to be 
independent of the particular process model used 
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for development and maintenance. A full descrip- 
tion of the model, including its dynamic nature is 
described in [Jeffery, Basili 87a and 87bj. The 
paper secondly aims to validate the model by a 
comparison of the model with the resource data 
models presented in the literature. 


2. THE PROJECT ENVIRONMENT 
CHARACTERISTICS 

Resources are consumed during the software pro- 
cess in order to deliver a software product. The 
software process has overall characteristics which 
are super-ordinate to the resources consumed. 
Therefore, before resource data can be character- 
ized it is necessary that a process characterization 
profile be established. This characterization 
includes data on factors such as: 

project type 

organizational development conventions 
project manager preferences 
target computer system 
development computer system 
project schedules or milestones 
project deliverables 

In this data the broad project and its environment 
characteristics are established, For example, is the 
process using evolutionary development or a 
waterfall method? Is the project to be developed 
by in-house staff or external contractors? What 
organizational constraints are being imposed on 
the project development time? What management 
constraints are being imposed, say on staffing lev- 
els? 

These factors form the environment in which the 
software process must occur, and will therefore 
determine, in many ways, the nature of that 
software process. A simple example’ of this is the 
question of the process model - evolutionary or 
waterfall. This constraint establishes milestones 
and the pattern of resource use, and therefore 
partially determines the interpretation of the 
resource data collected. 


3. THE RESOURCE CLASSIFICATION 

At the level below the characterization of the pro- 
ject and its environment we are interested in clas- 
sifying the resources consumed in the generation 
of the software product. In this section of the 
paper we present a structure for that 
classification. This structure covers only the 
resource aspect of the project and is therefore 
only concerned with the software process and the 
resources consumed or used in the process. The 
model is not concerned with the software product. 
As stated above, the resource model was first 
developed and presented in [Jeffery, Basili 87] 


The model structure consists of a four dimen- 
sional view. This four dimensional view is divided 
into two segments: 

1. resource type, and 

2. resource use 

In a software process the two segments being 
separated are (l) the nature and characteristics of 
the resource, ana (2) the manner in which we look 
at or consider the consumption of that resource, 

3.1 Resource Type 

In. the first segment we are concerned with classi- 
fying the nature of the resource; Is it someone's 
time, or a physical object such as a computer, or 
a logical object such as a piece of software? We 
are also interested in describing the properties of 
those resources such as description, model 
number, and cost per unit of consumption. 

By decomposing the resources into different types 
different views of the resources can be provided. 
For example, it may be important for operations 
personnel to know a breakdown of the hardware 
resources used on a project according to the 
different physical machines being used, whereas 
from a project manager’s perspective at a point in 
time, the specific machine may not be of interest, 
but the availability of a certain class of machine 
may be critical. Resource managers will be 
interested in the types of resources available (for 
example, people), and the characteristics of those 
resources for project planning purposes. Thus the 
categorization provided here is the basis of the 
resource management environment, in that it is in 
this segment of the model that the resources are 
listed and described. 

The resources of a software project can be 
classified as: 

- hardware 

- software 

- human 

- support (supplies, materials, 
communications facility costs, etc.) 

These, categories are meant to be mutually 
exclusive and exhaustive and therefore are able to 
contain each instance of resource data in one or 
other of the categories. 

Hardware resources encompass all equip- 
ment used or potentially able to be used in the 
environment under consideration. (For example, 
target and development machines, terminals, 
workstations). 

Software resources- encompass all previ- 
ously existing programs' 1 and software systems 
used or potentially able to be used in the environ- 
ment under consideration. (For example, com- 
pilers, operating systems, utility routines, previ- 
ously existing application software). 
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Human resources encompass all the people 
used or potentially able to be used for 
development, operations, and maintenance in the 
environment under consideration whether internal 
or external (subcontractors, consultants, etc) 

Support resources encompass all of the 
additional facilities such as materials, communica- 
tions, and supplies which are used or potentially 
able to be used in the environment under con- 
sideration. 

The values associated with these resources may be 
stored in both price and volume measures, where 
volume means, for example, hours of use or avai- 
lability, or the number of times a resource is 
needed, and price refers to the $ values associated 
with that resource. This may be a cost per unit 
measure or a cost per period of time. 

This four-way classification provides an initial 
resource-type decomposition. The aim in this 
decomposition is to separate the major resource 
elements that are used in the software process in 
order to provide manageability. This initial 
separation is necessary because of the very 
different nature of each of these resource types 
and the consequent difference in attributes and 
management techniques which are necessary in 
the estimation, evaluation, and control of each of 
these resource categories. 

Further decomposition within this segment may 
be desirable and will be dependent on the goals of 
the responsible persons. The number of different 
possibilities increase as the decomposition contin- 
ues within each of the major resource categories. 
For example, the exact nature of the resource 
decomposition within the hardware category will 
vary significantly from one organization to 
another because of the different hardware utilized 
and the organizational structure surrounding that 
hardware utilization. For example, it may be 
desirable to decompose hardware into target and 
development hardware if there is a difference, and 
software into operating systems and 
languages/editors in order to model say the avai- 
lability of cross-compilers. 


3*2 Resource Use 

Over the type segment we need to impose the 
second segment; the "use” structure. The categor- 
ization within this dimension allows the resources 
consumption to be associated with different per- 
spectives of the software process. For example, it 
is through this use structure that we are able to 
distinguish, for example, 

between prior-project expectations of consump- 
tion and resources actually consumed, or 

between resources consumed in each phase of 
the project, or 

between the utilization of a resource and the 
availability of that resource, or 


between an idea! view of resource planning and 
the resources actually available. 

The use structure consists of : 

1. INCURRENCE 

1.1 Estimated 

1.2 Actual 


2. AVAILABILITY 

2.1 Desirable 

2.2 Accessible 

2.3 Utilized 


3. USE DESCRIPTORS 

3.1 Work type 

3.2 Point in Time 

3.3 Resources Utilized 


3.2.1 Incurrence 

This category allows the resource information to 
be gathered and used in a manner suitable to the 
management of the resource. It is necessary, for 
example, to store data on estimated resource 
usage, resource requirements, and resource availa- 
bility. 

This data is necessarily kept separate from the 
actual resource incurrence or use, which is stored 
via the actual category. 

These two categories then permit process tracking 
via comparisons between them and extrapolation 
from the actual data. At the project summary 
points, explanations and defined data accumula- 
tions on estimated and actual resource use provide 
feedback on the process. This feedback should 
contain reasons for variance between the 
estimated and actual so that a facility for cor- 
porate memory can be established and the neces- 
sary data stored to facilitate and explain any 
updates of the current resource values. It needs to 
be noted that the model proposed allows for 
different estimates and actuals at different points 
in time. 

The two classifications are the basis for the struc- 
ture proposed because they constitute significantly 
different viewpoints on the process, and again pro- 
vide mutually exclusive categorization which will 
facilitate management estimation, evaluation, and 
control. 

This structure requires that process data, as it 
changes in value during the project, will not be 
lost but will be stored ui^an accessible manner so 
that meaningful analysis t>f projects can be carried 
out using a database that provides complete 
details of the project history. 
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This philosophy specifically addresses the need for 
a corporate memory concerning past projects. By 
implementing such a structured project log the 
basic data for such a memory is available m 
numeric and text format. 


3.2.2 Availability 

This category allows storage of a resource use by ; 

- desirable 

- accessible 

- utilized 

This categorization provides further refinement of 
the resource data. Through this, and say the 
incurrence category, it is possible to compare the 
actual resources utilized with the estimated utili- 
zation, and then trace possible reasons for vari- 
ance through the desirable and accessible dimen- 
sions. That is, differences between planned availa- 
bility and actual availability of a resource will be 
significant in understanding the software resource 
utilization that occurred during the process. 

Desirable is defined as all the resources 
that are reasonably expected to be of value on the 
project. 

Accessible is a subset of desirable (when 
considering the project resources only) and is used 
to define the resources which are able to be used 
on the project. 

The difference between desirable and accessible is 
those resources seen as desirable for the project 
but which were not available for use during the 
project. This difference may occur, for example, 
because of budget constraints or inability to 
recruit staff. The desirable resource list permits an 
"ideal” planning view. When compared with 
accessible it allows management to _see_ the 
compromises that were made in establishing the 
project, thus facilitating a very explicit basis for 
risk management within the resource database. 
The database is thereby able to hold views of not 
only the resources actually applied to the project 
but also those resources which were considered to 
be desirable along with the reasons for their use 
or non-use. In this way the resource trade-offs are 
made explicit. 

Utilized is a subset of accessible and is 
defined as the resources which are used in a pro- 
ject. 

The difference between accessible and utilized 
represents those resources available for the project 
but not used. This difference will arise because of 
three possible reasons: 

1. The resources prove to be inappropriate for 
the project under consideration, or 

2. The resources are appropriate but they are 
excess to those needed 


3. The resources are appropriate, and their use 
is contingent on an uncertain future event. 

The use of these storage categories is somewhat 
complex and is explained in detail further below 
in section 3.4.2. 

Through this availability category we are able to 
distinguish between: 

(1) the resources which are reasonably expected 
to be beneficial to the process (desirable), 

(2) the resources which exist in the organization 
and are able to be used if needed (accessible), 
and 

e the resources which are used in a project 
ilized) 

Through this categorization it is then possible to 
track resource usage and to pinpoint their use or 
non-use and to ascribe reasons particularly ^ to 
their non-use as in the case of non- accessibility. 
As in the INCURRENCE category, . the reasons 
for divergence between desirable, accessible, and 
utilized are stored in a feedback facility. 

3.2.3 Use Descriptors 

This category provides a description of the con- 
sumption of the resource item in terms of three 
essential characteristics of the consumption that 
item: 

1. The Nature of the Work being done by 
the resource: (e.g. coding, inspecting, or 
designing) This category can be used in con- 
junction with other views to distinguish 
between process activities, such as human 
resources estimated to be desirable in design 
work, or machine resources actually utilized 
in testing , or elapsed time implications of 
inspections. 

2. Point in Calendar Time : This category 
pinpoints the resource item by calendar 
time. In this way resource items (estimated 
or actual; desirable, accessible, or utilized) 
are associated with a specific point in time 
or period of time. This facilitates tracing of 
time dependent relationships and the com- 
parison of resource va lues o ver time. 

3. Resources Utilized : This category meas- 
ures the extent of resource consumption in 
terms of hours, dollars, units, or whatever is 
the appropriate measure of use. 

The Use Descriptors also provide the link to the 
work breakdown structure which is commonly 
embodied in process models. This link is esta- 
blished through the association of a particular 
piece of work being dona at a point in time with 
the work package described in the work break- 
down structure. This point is discussed further 
below in Section 8, Validating the Model. 
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3.3 COMBINING THE VIEWS 

The structure suggested here can be viewed as a 
hierarchy for the purpose of explanation. Such a 
hierarchy is shown in Figure 1. 


AVAltABIUTY 


FIGURE I. THE STRUCTURE OF THE TDC MODEL 

In this figure we see that the proposed structure 
views the software project (which has attributes 
describing that project) consuming resources. The 
resources are characterized as having four dimen- 
sions of interest (type, use, incurrence, and availa- 
bility). At the resource type level we describe each 
resource as being one of hardware, software, 
human, or support, and having various attributes. 
The attributes for each of these four types will be 
different in nature. For example, the human attri- 
butes might include name, address, organizational 
unit, skills, pay rate, unit cost, age, and so forth. 
The attributes for hardware will be quite 
different, describing manufacturer, purchase date, 
memory capacity, network connections, or similar 
types of characteristics. 

At the next level in the diagram we model the use 
of the resource. In the first instance this involves 
the type of work that the resource is performing, 
the point (or span) in calendar time at which the 
work is being done, and the measure of the 
amount of work done. This last measure (amount 
of work) might be expressed in person-time, 
execution-time, connect-time, or whatever is the 
relevant measure of work for the resource 
instance. 

The use of the resource is then described as being 
either estimated or actual, and both of these may 
be desirable, accessible, or utilized. In this way 
the following concepts are supported : 


1. Estimated Desirable : The resources con- 
sidered 11 ideal” at various stages of the planning 
process. 

2. Estimated Accessible : The resources 
which are expected to be available for use in the 
process, given the constraints imposed on the 
software process (a contingency plan). 

3. Estimated Utilized : The resources which 
it is anticipated will be used in the software pro- 
cess. 

4. Actual Desirable : With hindsight,. the 
resources which proved to be the ” ideal consider- 
ing the events that occurred in the software pro- 
cess. A part of the learning process. 

5. Actual Accessible : Again with hindsight, 
the resources which we^e actually available and 
could have been utilized. A part of the learning 
process. 

6. Actual Utilized : The resources actually 
used in the software process. 

Categories one through three are used initially for 
planning purposes. The numeric and text values 
associated with each of these three categories may 
be derived from: 

a. individual or group knowledge 

b. a knowledge base 

c. a database of prior projects, and/or 

d. algorithmic models 

At the very simplest level, the planning process 
might establish only numeric values in^ the 
estimated utilized category based on individual 
knowledge alone. In essence, this is the only form 
of estimation used in many organizations, wherein 
project schedules and budgets are established by 
an individual, based on that individuals experi- 
ence. These estimates represent the expected pro- 
ject and resource characteristics for the duration 
of the project. 

The extensions suggested here allow these esti- 
mates to be enlarged in the following dimensions : 

The nature of the estimate 

The source of the estimates 

The timing of the estimates 

1. The nature of the estimate. The model 
allows project and resource managers to distin- 
guish between desirable, accessible, and utilized 
estimates as discussed above. The estimated desir- 
able dimension would be used at a fairly high 
level in the project planning process to outline the 
hardware, software, people, and support resources 
that are considered to be desirable for the project. 
This may list specific -pieces of hardware and 
software which are desirable at certain points in 
time. It might also be used to list characteristics 
of the people (such as skillsj that would be ideal 
on the project. The accessible dimension would 
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then reflect the expected resources that will actu- 
ally be available to be used. Again this could be 
at a fairly high level, indicating the resources 
available, the differences between these and those 
desirable, and the reasons why the two categories 
do not agree; reflecting cost constraints, or risk 
attitudes which have been adopted as part of the 
project management profile. The utilized category 
would normally extend to a lower level in terms of 
the project plan, detailing estimated resources 
perhaps down to the work package level and short 
periods of time. 

2. The source of the estimates. It was sug- 
gested above that there are four major possible 
sources for these estimates; individuals or groups 
of people, a knowledge base, a database of prior 
projects, and algorithmic models of the process. 
Each of these should be supported in a measure- 
ment environment, and each has significant impli- 
cations with respect to the design of such an 
environment. The current state of the art appears 
well equipped to support algorithmic models of 
some parts of the estimation process (for example, 
estimates of project effort based on one of the 
many available estimation packages such as 
COCOMO [Boehm 81], SLIM [Putnam 81], SPQR 
[Jones 86]). Similarly the tools available in the 
database environment allow the storage and 
retrieval of numeric data on past projects. How- 
ever the storage and searching of large volumes of 
text data on prior projects, the use of a 
knowledge base, and the support of group decision 
support processes are all the subject of current 
research (see for example, [Bernstein 87], 
[Nunamaker, et.al. 86], [Barstow 87], [Vaiett 87]). 

77ie timing of the estimates In the struc- 
ture suggested, all estimates may be made before 
the commencement of the software process and 
also at any point in time during the process. How- 
ever there are certain points in time during the 
process at which estimates are more likely to be 
updated. These are: 

1. at project milestones 

2. at manager initiated points in time at 
which major divergence between estimate 
and actual is recognized by the manager 

3. at system initiated points in time at 
which the measurement system recognizes a 
potentially significant divergence between 
estimate and actual 

The third possibility implies that the measure- 
ment system is able to intelligently recognize the 
existence of a problem with respect to the com- 
parison of actual and estimate. This facility is 
suggested as needed because one of the major 
management stumbling blocks is generally not 
concerned with taking action once a problem is 
identified, but the Identification of the problem In 
the first place. This identification problem occurs 
because of the volume of data that needs to be 
processed in order to recognize a potential prob- 
lem state. It is the measurement environment 


which is expert at processing the data volume. It 
is the manager who is expert at taking corrective 
action once the problem is highlighted. 

Categories four (actual desirable) and five (actual 
accessible) of the structure exist to provide a feed- 
back and learning dimension to the project data- 
base. These values would be determined after the 
project is complete. And in the comparison of the 
estimates made at various stages of the process 
and these two categories, a process is facilitated in 
which the organization can learn based on the 
variance of expectations and actual which have 
occurred in the past projects. As with the esti- 
mates, the categories of desirable and accessible 
are used in order to allow the comparison of 
"actual ideal” with "actual available” so that an 
ex-post view of the management of the process 
can be captured. The question being asked here is; 
"How could we have handled resources better?" It 
is a learning mechanism to generate explicit new 
knowledge for the knowledge and data bases, and 
also to improve individual and group knowledge. 

Category six (actual utilized) will be the most 
active category within the structure, carrying all 
of the values associated with the resources of the 
project. These values will be updated on a regular 
basis throughout the software process, and will be 
the source of the triggering process mentioned in 
the discussion of updates to the estimates. 

The data collected during the project should be 
able to: 


1. increase individual and group knowledge 

2. improve the knowledge base 

3. add to the prior project database, and/or 

4. support the algorithm determination 
process in the individual organization. 

In summary, the model proposed is a four dimen- 
sional view of resource data. The four views in the 
data model are: 

1. RESOURCE TYPE: which is a mutually 
exclusive and exhaustive categorization 
which captures the nature of the resource. 

2. INCURRENCE: which Is also mutually 
exclusive and exhaustive describing actual or 

, estim ated r esources. It c arries an additional 
feedback element to contain the corporate 
memory explaining the difference between 
the category values and differences over 
time. 

3. AVAILABILITY : in which each category 

is a subset of the the higher category, allow- 
ing desirable, accessible, and utilized 
resources. Again feedback is used to explain 
the differences between categories and over 
time. ^ 
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4. USE DESCRIPTORS: which categorizes 
specific elements in the nature of the 
resource use. These are the nature of the 
work done by the resource, the point in time 
of the work, and the amount of that work. 


3.4 USING THE TDC STRUCTURE 

3.4.1 At the project level 

Discussion so far has applied the proposed 4D 
structure to resource classification. It is appropri- 
ate to also consider using this structure, or a part 
of it, for the Project Environment Characteristics 
outlined in section 2 above. In this way the con- 
straints acting on the software process can be 
identified as applying: 


to a particular type of resource, 
either estimated or actual 
with a stated availability 
at a point in time, 

concerning a particular type of work 

An overall model of the software project is shown 
in Figure 2. In this figure the meta-entity project 
is decomposed into a number of tasks or con- 
tracts, each task consuming the meta-entity 
resource and producing the meta-entity product. 
In the implementation of this model the meta- 
entities will require many entities to characterize 
them. 



FIGURE 1 AN OVERVIEW OF THE SOFTWARE PROJECT 

Thus the project has characteristics, as do the 
tasks and subtasks, the resources, and the pro- 
ducts. Characteristics at all of these levels need 
to be stored. 

Through the storage of the project characteristics, 
the constraints acting on the product or process 
determined at any time before or during the pro- 
ject can be tracked for consistency, and any 
changes noted to facilitate a relationship analysis 
between the project and the resource occurrence 
values accumulated during the process. 


A simple example of the application of this struc- 
ture would be where the process organization is 
changed during the development, say a change 
toward greater user involvement. This change 
would be reflected in a difference between the 
estimated project characteristic and those at the 
point in time at which the change occurred. This 
information is then used to explain variances that 
occur in the process data, such as a changed pat- 
tern in staff utilization. 

Examples of the data stored at the project level 
would include: 


- the type of project 

e.g.real time, business application 

- the project elapsed time 

- the total project effort 

- the total project cost 

- the type of development process 
e.g. evolutionary 

- the target computer 

- the development computer 

- the project deliverables 

- the project milestones 

- the project risk profile 

The application of the TDC model at this level 
provides a mechanism for storing estimates, accu- 
mulating actual values, and facilitating feedback 
and learning at the level of the project and its 
development environment. 

If we take the project milestones as an example 
and assume that the milestones apply equally to 
all resource types, then the model suggests we 
store: 


- estimated desirable milestones. This is an 
"ideal world” view of the project milestones; 
the dates at which we could deliver if we were 
not constrained. 

- estimated accessible milestones. Given the 
constraints we will be working under, these are 
the dates at which we could deliver if it were 
necessary. 

- estimated utilized milestones. These are the 
dates at which we expect to deliver, taking into 
account the dimensions of desirable and acces- 
sible. 

These three views, in their values and difference, 
provide a perspective on the risk associated with 
the project; the smaller the difference between the 
categories, the higher the risk. More specifically, 
the difference between estimated desirable and 
estimated accessible shows the extent to which 
elapsed time could be changed if the constraints 
could be modified. For example, if the estimated 
final desirable milestone were June 30th and the 
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estimated final accessible milestone was August 
30th, the difference of two months measures the 
estimate of the extent to which the project could 
be compressed if the restricting constraints could 
be be removed. 

The difference between the estimated accessible 
and the estimated utilized provides a measure of 
the available slack in the milestones. This 
difference is the extent to which the milestones 
could be compressed, without modifying the pro 
ject constraints. In the example above, the 
estimated utilized final milestone might be say 
November 30th. In this case the difference 
between accessible and utilized of three months 
reveals the amount of elapsed time compression 
that is possible on this project without changing 
constraints. 

In these relationships we see some of the dynamic 
nature of the project characteristics. This suggests 
that for the TAME measurement environment, if 
a change in project characteristics such as the 
nature of the process occurs, then this should 
trigger the review of the project milestone and 
effort values, which will also be reflected at the 
lower level in the task and resource data values. 

In the actual category we need to store the : 

- actual desirable milestones. As explained 
above, this category is used for feedback and 
learning. It carries the values calculated after 
project completion based on the knowledge 
gained about the project during its completion. 
This value is again an "ideal world” value. 

- actual accessible milestones. This is also a 
feedback and learning category which says, 
based on the constraints which did eventuate 
in the process what milestones could have been 
achieved? 

- actual uttlized milestones. This category stores 
the dates of the milestones achieved. 
Differences between actual and estimated are 
stored in a feedback facility to provide a 
mechanism for learning and a mechanism for 
calculating the actual desirable and accessible 
at project end, 


3.4.2 At the resource level 

The description of the use of the TDC structure 
at the resource level amounts to a process model 
of resource planning and use in software develop- 
ment. This process can be described as an 
interacting three-stage process involving the sub- 
processes of: 

1. planning 

2. actualization 

3. review 

The planning process establishes and records the 
resource expectations or estimates before and dur- 


ing the software project, and the actualization 
process tracks and records the actual use of 
resources during the software project. The review 
process compares actuals with estimates for the 
purposes of modifying the estimates and learning 
from experience. In this way the feedback referred 
to above provides information for an historic 
resource database for future planning and estima- 
tion. Details of this process model are given in 
[Jeffery, Basil! 87]. 


Application of the planning and review 
cycles 

In any particular organization, it may be deemed 
sufficient to use only a part of the planning and 
review processes outlined here, and therefore only 
a part of the TDC structure presented in this 
paper. 

For example organizations may not wish to use 
project reviews, or they may not consider it 
appropriate to carry out formal contingency plan- 
ning or risk management. At the simplest level 
only the estimated utilized and the actual utilized 
may be used, perhaps providing input to an infor- 
mal project learning process which occurs at the 
individual level. 

Specifically, it is most likely that in software 
environments with very little uncertainty (say “an 
implementation of the twentieth slightly different 
version of a well known system) there may be no 
need to explicitly consider the desirable or even 
accessible dimensions of the resource model. If 
uncertainty is very low, the utilized level of the 
model may capture all the necessary data. The 
advantaee of the model in this case is that the 
data excluded is done so in the knowledge that 
there is no information in those levels not used. 

In higher uncertainty environments, the model 
prompts the estimator to think explicitly of the 
resource risks and uncertainty of the development 
process, and to quantify or express that risk as a 
part of the resource database. 


4, VALIDATING THE MODEL 


Three significant pieces of work in the literature 
which provide definitions of the types of data 
needed to support the measurement of the 
software process are [Penedo, Stuckle 85], 
[Tausworthe 79], and [Data & Analysis Center for 
Software 84, STARS Measurement DID Review]. 


Penedo and Stuckle (P&S) provide an excellent 
structure and content of a project database for 
software engineering environments which can be 
used here to test whether the model resulting 
from the top-down methodology employed is able 
to encapsulate all of the process data suggested by 
them as needed in a project database. Table 1 
lists the entities identified by Penedo and Stuckle 
and associates the particular model categories 
which would be used in the model derived here to 
describe them. 
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The first aspect which is noticed when mapping 
the 31 P&S entity types to the TDC model is that 
the broad structure presented in section 2 above 
[The Project Environment Characteristics) is an 
important link between the software process and 
product. The P&S list contains entities for the 
project, task, product, and resource categories of 
Figure 2. In table 1 the P&S entities such as the 
requirement and risk have been categorized as 
project characteristics, while entities such as data 
component, external component, document, inter- 
face, product description, product, and software 
component have been categorized as product 
instances. 

But the focus of this paper is not on the project 
or the tasks which go together to make up that 
- project. Rather the focus is the resources con- 
sumed by those tasks. In this respect we notice 
that only a subset of the available TDC categories 
are used in the P&S entities. For example, at the 
Resource Type level we see instances of all four 
categories (Hardware, Software, Human, and Sup- 
port), but at the next level it appears that the 
P&S model concentrates on actual values. It is 
difficult to see how the P&S model stores values 
for estimates, and particularly how the informa- 
tion explaining divergence between estimate and 
actual can be stored. The same applies to the 
Availability level of the TDC structure. The P&S 
model appears to concentrate on the Utilized 
aspect and does not appear to model the other 
availability dimensions presented in the TDC 
structure. This may well be because these dimen- 
sions of resource data were considered not to be 
necessary in the environment of the P&S study. 


Table 1. P&S Database Entitle* in The Model Structure 


Penedo & Stuekle Top Down Model 

Eatiti «* Categorie* 


Accountable Task 
and Contract 


Change Item 

Consumable Purchase 

Data Component 

Dictionary 

Document 

Equipment Purchase 
External Component 
Hvdware Architecture 
Hardware Component 
Interface 
Milestone 

Operational Scenario 
Person 

Problem Report 
Product 

Product Description 

Requirement 

Resource 

Risk 

Simulation 
Software Component 
Software Configuration 
Software Executable Task 
Software Purchase 
Test Case 
Teat Procedure 
Tool 

WB5 Element 


The taak and contract are the 

convergence of proeeaa 

and product and subset* of the project. 

It is in a contract 

or taak that resources are consumed 
to produce the product. They are not, 
therefore, resource entities. 

This item is generally associated 
with a product change. 

'Support resource, incurrence and availability 
not specified. 

Product Entity 

•Software resource, or perhaps product entity 
Product Entity 
•Hardware resource 

•Hardware resource or perhaps Product Entity 

•Hardware resource or perhaps product entity 

•Hardware resource or product entity 

Product Entity 

•Project Entity 

Product Entity 

•Human Resource 

•Process as part of feedback or Product entity 

Product Entity 

Product Entity 

Project Entity 

•Support resource 

•Project Entity 

Product entity 

Product Entity 

Product Entity 

Product Entity 

•Software resource 

•Software resource and/or product entity 
•Task or project characteristic 
•Software resource 

Project Decomposition Entity, may be the same 
as accountable task and contract 


It remains to be seen, of course, whether all of the 
categories available in the TDC structure are 
deemed necessary in any particular environment. 
However, the advantage of such a structure is 
that exclusion of certain categories of data occurs 
explicitly rather than implicitly. 

T he *i c 2 nd mocie 1 1 suggested as a means of testing 
the TDC model is that provided by [Tausworthe 
79). In this work the model’s entities are not 
presented in a list form, but are included in text 
discussion and report forms. For this reason it has 
been, necessary, to convert the form to a list of 
entities. In doing so it is always possible that 
misconceptions of Tausworthe’s ideas may be 
- Present. However, even if incomplete, it provides 
another test of the suitability of the TDC model. 

The Tausworthe structure is very much oriented 
towards a decomposition of the project into tasks 
and the association of resources with those tasks. 
Thus the modelling approach used by Tausworthe 
is somewhat at a tangent to the modelling 
approach used here since once again our focus is 
on resources, not the activities which consume 
.those resources. This is not to say, however, that 
it is not necessary to associate resources with 
tasks, but that it may— he necessary to model 
resources apart from the tksks that consume them 
in ordeT to better understand all of the dimen- 
sions of resource data. 
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The entities listed here are a partial list derived 
from the work breakdown structure, the software 
technical progress report, the software change 
analysis report, and the software change order of 
Tausworthe’s model. From these sources the fol- 
lowing resource data, among others, were 
identified as necessary- to establish a resource 
database. Only some "of the Tausworthe entities 
have been listed here. This has been done to the 
extent that is necessary to illustrate the conclu- 
sions drawn. 

From. Table 2 it is clear that the focus of atten- 
tion in the Tausworthe work is the project and 
the decomposition of that project into its com- 
ponent parts. Thus we see that the resource data 
is associated with particular tasks and activities. 
In viewing the data in this way a structure is pro- 
vided which is excellent for control purposes, in 
that it establishes units of accounting which are 
more easily estimated and controlled. What is not 
clear from the structure, however, is how ques- 
tions of desired versus accessible resources can be 
modelled, nor exactly how actual versus estimated 
can be compared and conclusions stored for use in 
later project estimates. It is also difficult to see 
how. the model proposed in the WBS can easily 
facilitate the analysis of resources consumed on a 
particular activity type (say inspections), regard- 
less of the project phase in which the inspections 
were done or the project task in which they were 
done. Thus questions such as the value to the pro- 
ject of using a particular form of inspection may 
be difficult to answer because the data model may 
make this data difficult to isolate. 


Perhaps the most detailed resource data collection 

w !oped 30 far haa beeD that of the 
o i ARS Measurement Data Item Descriptions. 


x auic 


'uiurmauon wmcn loilows tn _ 

derived from stars Software Development 
Environment Summary Reports DI-E-SWDESUM 
DI-F-RESUM, DI-F-REDET, (06 JULY 1984].’ 
These reports contained information most 
relevant to the task of validation of the TDC 
model. The data suggested as necessary by these 
reports concerned aspects of the project, the pro- 
cess, and the product. In this paper only those 
aspects concerning the project and the process 
have been listed. As with the Penedo and the 
Tausworthe models, the data model implied in the 
work appears not to have been developed on the 
basis of a theoretical structure, but rather from a 
pragmatic evaluation of those data items deemed 
necessary for project management. In addition, 
because the data items are 

listed in the context of data capture forms, some 
rearrangement of these items has been carried out 
in the following data list in order to provide a 
clearer presentation of these items. 


Tshle 2. Tsusworthv D«m«d Entity List 


Tsusworth# 

Entitiw 


SUB: 

Stiff I.D. 

Stiff Nun# 

Stiff PhOQt 
Ti*k Activity: 

Tuk U>. 

Tuk Activity I D 
Budget 9 
Tuk: 

Tuk l.D. 

Tuk Mime 
Tuk Deter 
Teak M'gtr 
Tuk Budget », ETC. 
Software Chiag# Order 
S/w are ID 
Change Order # 
Activity m 
Person ID 
Description 
Start Date, etc. 


Top Down Model 
Categoric - 


Human resource, estimated or actual 


Tbe dollar value may be a sum of ail resources 
consumed on a tuk- activity, estimated or actual 


The value is a sum of all resources, estimated 
and/or actual 


The focus is again on the activity. The resources 
may be any type, estimated or actual. 


However, it is clear that the resource data sug- 
gested as necessary by Tausworthe are readily 
modelled in the TDC structure. The importance 
of the application of the TDC model to the pro- 
ject and task level is highlighted by Tausworthe 
and also Penedo Sc Stuckle, so that the associa- 
tion of resource data and project work breakdown 
structures can be facilitated. 
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TABLE . 3. STARS Measurement Data Items 
Descriptions 


A. PROJECT NAME 

Project Name 
Contractor - 

Contract No. 

Start date, Finish Date 

Software Level (System, Subsystem, CSCI) 

Application Type 

Application description 

Revision of current project (y/n) 

Revision -version no. 

% of software redeveloped 
Total no. lines of source code 
Initial development (y/n) 
if y - Total no. lines source code 
no. of instructions 
no. of data words 
System Structure- 
single overlay 
multiple overlay 
(# overlays, avg. sise bytes 
independent subsystems 
(# subs, avg.size bytes 
virtual memory system 
(amount of addressable memory, size bytes 
Progaraming language and % used 
Constraints - 
Execution Time, rating 
Main memory size, rating 
Product Complexity, rating 
Database size, rating 
Methodology, rating 
required reliability, rating 
Other, rating 

Concurrent Hardware development (y/n) 
Operational site development (y/n) 

Multiple site development (y/n) 
no. of development sites 
no. of test sites (if different) 

Other Constraints .(text), 
cost estimation assumptions made 
cost estimation methods used and supporting 
rationale 

rationale for discrepencies between current 
estimates and all previous estimates 


B. SITE CONFIGURATION INFORMATION 
Site ID 

Description (development, test) 

Computer manufacturer 
Model name 
Model no. 

no. of persons accessing site 

no. of input terminals 

Terminals in each programmers office (y/n) 

Input terminals in central area (y/n) 
no. of card readers 
no. of printers 
no. tape drives 
no. disk drives 
other peripberals.(specify). 
no, documentation sets on hard ware /software 
environment available 
no. site support personnel 
amount of storage in development computer 
main memory real 
main memory virtual 
aux memory 

DEVELOPMENT SITE ACCESS 
Site LD. 

Access type: % batch 

% interactive 

Average job turnaround time 

no. hours per day development site available 

no. days per week development site available 

no. hours per day utilized 

no. days per week utilized 

TEST SITE ACCESS 
Site LD. 

no. hours per day test site available 
no. days per week test site available 
no. hours per day test site utilized 
no. days per week test site utilized 
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C. PROJECT PHASE INFORMATION 
[examples] 

requirements 

Development system used (y/n) 

Documents maintained on the dev. system (y/n) 
Methodology (formal spec., Functional spec., 
procedural spec., english spec., none, other) 
Tools/Formalisms (requirements analyzer, word 
processor, on-line editor, c.m.t., librarian, 
spec lang, PDL, none, other) 
start and finish date 
deliverables 

design 

Development system used (y/n) 

Documents developed/maintained on system (y/n) 
Methodology (top down, bottom up, hardest 
first, prototyping, iterative enhancement, 
none, other) 

Toois/Formalisms ( software dev. folders, 
design reviews, walkthru’s, flow charts, 

HIPO, etc.) 
start and finish date 
deliverables 

Implementation 

Development system used (y/n) 

Documents maintained on development system (y/n) 
Unit testing performed on dev. system (y/n) 
Methodology (top down, cpt, prototyping, etc.) 
Tools/Formalisms ( code reading, pre-compiler, 
dbms, etc) 
start and finish date 
deliverables 

teat and integration 

Testing performed on development system (y/n) 
Documents maintained on system (y/n) 

Level of testing performed on dev system 
Methodology (spec driven, top down, none, etc) 

Tools /Formalisms ( ) 

start and finish date 
deliverables 

D. PROJECT PERSONNEL INFORMATION 

[these values can be derived from more detailed 
records] 

Project Name 

Job Classification (supervisor, consultant, 
analyst, programmer, site operator, 
librarian, other) 

Avg. no. years application experience 

Avg. no. years experience with software 

Avg. no. yrs software training 

Avg. no. yrs programming language experience 

Avg. no. yrs hardware experience 

Avg. capability rating 

communication 

Regular project status meetings (y/n) 

How often? 

Persons typically in attendance 
(classification, No.) 


E. RESOURCE EXPENDITURE ATTRIBUTES 

summary level 

[these values may be derived] 

Project name 

total system cost, estimated, actual 
total software cost, estimated, actual 
total labour cost $, estimated, actual 
total software labour cost $, estimated, actual 
total labour hours, estimated, actual 
total software labour hours, estimated, actual 
total staff size, start, finish, estimated, 
actual 

total software staff size, start, finish, 
estimated, actual 

total computer costs $, estimated, actual 
total software computer costs $, estimated, 
actual 

total computer hours, estimated, actual 
total travel costs $ 
total material costs $ 
total miscellaneous costs $ 

[these may be divided by milestones or activities] 
labour coats 

[these values may be derived] 

labour category id 
total hours , 
no. of people, start, finish 
cost $ 

computer hours 
computer costs $ 

computer costa 

(these values may be derived] 

no. of computers used 
no. of different types of computers 
total computer hours 

*** for each computer*** 

computer i.d. 
number of hours 
total computer costs $ 
cost of each computer $ 

task costs 

[these values may be derived] 

task i.d. 
definition 
personnel costs 
software costs 
hardware costs 
supplies costs 

****for each task**** 

****for each labour category**** 

total hours 

no. of people, start - finish 
cost $ 

computer hours 
computer cost $ 
travel c<at $ 


****for each task**** 

total cost of labour 
total hours of labour 
total cost of computer 
total hours of 'computer 
total cost of travel 
total cost of materials 
total cost of miscellaneous 
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The Table provides data items to describe the 
project, development and test site configurations 
and access, project phases, personnel assigned to a 
project, and resource expenditure summaries. The 
detail shown here has been selected to highlight 
the volume of data items which will be necessary 
in a measurement system. 

►- 

In terms of the TDC model, the STARS list shows 
recognition of the need to store resource availabil- 
ity in that the development and test site access 
data includes an accessible and a utilized dimen- 
sion. There appears, however, to be no facility for 
storing the desirable dimension suggested in the 
TDC model. The STARS list also shows extensive 
use of the incurrence dimension in section E - 
Resource Expenditure Attributes, wherein 
estimated and actual resource use is tracked. The 
USE DESCRIPTORS of work type, point in time, 
and resource utilized are also extensively used in 
the STARS list. It is not possible from the docu- 
mentation, however, to determine the reasons that 
the availability dimension was not applied more 
extensively in the data model (for example acces- 
sability of personnel or specific hardware or 
software items are not modelled). It can be 
assumed that it was considered to be innappropri- 
ate for entities other than site access. 

The STARS data list provides considerable sup- 
port for the theoretical structure provided in the 
TDC model. It reveals a considered need for the 
storage of : 

1. Project information 

2. Resource type information 

3. Incurrence information 

4. Availability information and 

5. Use descriptors 

Of considerable significance is the fact that none 
of the three schemas considered here have sug- 
gested data entities or items which cannot be suc- 
cessfully modelled using the TDC structure. It 
appears that the schemas considered here may be 
incomplete when compared with the TDC struc- 
ture, but the reasons for the apparent exclusion of 
data entities and items are not known, but may 
be based on purely pragmatic reasons. 

5. CONCLUSIONS AND IMPLICATIONS 
AT THE RESOURCE DATA LEVEL 

The model presented here is meant to be general 
and provide a perspective for project manager and 
organization in identifying and tracking resources. 

It should help in better understanding the 
compromises made in resource allocation. How- 
ever, it is assumed that any project (or even 
organization) will work with a subset of this 
modeh For example, one might limit the number 
of availability views, such as combining desirable 
and accessible, or track only a subset of the 
resource categories. The subsetting process pro- 
vides feedback on what has not been tracked. The 
actual data collected is driven by the 
goal/ question /metric paradign based upon the 
goals set by the project and the organization. 


The conclusions to be drawn from this research 
can be divided into two categories: those concern- 
ing the model itself, and those concerning the 
validation of that model. 

In terms of the model itself, the discussion has 
suggested storage of resource data of a type which, 
has significant storage and access implications; 
that of numeric and non-numeric project and 
resource data. It has been assumed in the discus- 
sion that the resource database is able to store 
not only numeric resource values, but also reasons 
for those values along with the resource environ- 
ment characteristics. 

A system using these suggestions should be able 
to efficiently search the numeric and non-numeric 
data in a manner which will eventually enable the 
system to propose reasons for numeric variances 
which occur in the database. In this way the sys- 
tem must be able to not only highlight a 
significant variance, say between an estimated 
and an actual resource occurrence value, but it 
should also be able to search the project charac- 
teristic database and the numeric and non- 
numeric resource classification database in order 
to propose or associate reasons for the variance. 

It can be said that the model presented here has 
four broad implications : 

1. It proposes a resource categorization 
which will allow project database designers to 
explicitly consider the content of that database 
against a model of the resource environment. In 
this way, a particular individual’s view of the 
resource data can be positioned in a context and 
compared with other external views of the same 
data. This model should motivate the resource 
data user to consider the measures that may be 
beneficial in. seeking improvement in the particu- 
lar process goals. 

2. It suggests a project management 
system's environment which will be able to 
achieve far more in terms of management support 
than any known environment available today. It 
is able to do this because of the extent and 
dynamic nature of the model of the resource data 
proposed. 

3. It provides a resource categorization 
which can be used when considering relationships 
between tasks or contracts and resources. 
Specifically it provides a focus for the considera- 
tion of the resources consumed within a task. 

4. It provides assistance when applying the 
Goal/Question/Metric process paradigm, so that 
questions which answer the resource purpose of 
the study are highlighted and the measures 
appropriate to those questions are suggested. 

In terms of the validation of the data model we 
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have seen by reference to three published models 
that the proposed theoretical structure for 
resource data is able to encompass all that has 
been suggested as necessary for resource manage- 
ment. Also of significance, is the fact that each of 
the publications used contains different views of 
the necessary data and that each one omits cer- 
tain elements that the other appears to consider 
of benefit. This is, of course, the norm in compar- 
ing different external views in a database design 
exercise. One advantage of the TDC model is that 
it is able to act as a data model template, sug- 
gesting the data categories which need to be con- 
sidered when designing a resource data schema. If 
it is used in this way the data items excluded 
from the particular resource model instance will 
have been excluded on the grounds that they are 
deemed unneccesary in the particular environ- 
ment, rather than being excluded because the 
category of data ( for example, estimated desir- 
able hardware for testing) was not noticed by the 
data base designers as necessary. 

Thus we can be confident that the theoretical 
model proposed in the TDC structure can contain 
all of the project and resource data so farsug^ 
gested in the literature as necessary in a resource 
management environment. In addition it appears 
that there may be project and resource informa- 
tion of use in resource management which has not 
been included in prior models. The practical need 
for this additional information has not been 
justified in this piece of research but is the subject 
of other current work by the authors. 

We have begun to apply the model independent 
of TAME in a couple of industrial environments 
and have found it provides a useful framework for 
planning and tracking resources throughout a pro- 
ject. We have not yet reached the stage where we 
have been able to evaluate the feedback process, 
however. 
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EXPERIENCES IN THE IMPLEMENTATION OF A LARGE Ada PROJECT 
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BACKGROUND 

During the past several years, the 
Software Engineering Laboratory (SEL) of 
Goddard Space Flight Center has been 
conducting an experiment in Ada [6], [8] to 
determine the cost effectiveness and 
feasibility of using Ada to develop flight 
dynamics software and to assess the effect 
of Ada on the flight dynamics environment. 
This experiment consists of near parallel 
developments of a dynamics simulator in both 
FORTRAN and Ada. A study team consisting of 
members from the SEL has monitored 
development progress and has collected data 
on both projects throughout their 
development. 

Both the Ada and the FORTRAN teams 
began work in January, 1985, using the same 
set of requirements and specifications to 
develop their simulators. The FORTRAN 
dynamics simulator team completed acceptance 
testing by June, 1987, after following a 
development life cycle typical of projects 
in the flight dynamics environment [5]. The 
development was carried out on a DEC VAX- 
11/780 and the completed FORTRAN dynamics 
simulator consists of about 45,000 source 
lines of code. 

The Ada development began with a period 
of training [7] in both the Ada language and 
the methodologies appropriate for Ada [11]. 
The team was not previously experienced in 
Ada, although they were more experienced 
than the FORTRAN team in both the number of 
years they had programmed (8.6 years 
compared to 4.8 for the FORTRAN team) and 
also in the number of languages they knew (7 
compared to 3). The Ada team was also 
experienced in more types of software 
applications, but only 43* of the Ada team 
had previous dynamics simulator experience 
compared to 66* of the FORTRAN team. 
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Following the training period, the Ada 
team began a phase of analyzing the 
requirements and then they began design 
using an object oriented methodology called 
GOOD (General Object Oriented Oesign) which 
was developed by the team during the 
training and design phases. More 
Information on GOOD and the lessons learned 
during the design phase can be found in [21, 
[4], and [10]. 

Coding and unit testing began in April, 
1986, on a DEC VAX 8600 and continued 
through June 1987. The Ada project has 
completed system testing and consists of 
approximately 135,000 source lines of code 1 . 
This paper will describe some of the 
similarities and differences of the two 
projects and will discuss some of the 
interesting lessons learned during the 
code/unit test and integration phases of 
this project. 

INFORMATION COLLECTION 

The information presented in this paper 
was collected by using the following four 
methods: 1) Collection of SEL forms 

2) Interviews 3) Observation of development 
4) Code analysis. The SEL forms solicit such 
information as a detailed breakdown of the 
hours spent by programmers, managers, and 
support staff on a project and detailed 
information on changes and errors which 
occurred during the development. During the 
course of the project, over 2000 forms were 
collected; about 625 of these documented- 
errors and changes. 


1. A source line of code is defined to be 
any 80 byte record of code including 
commentary, blank lines and executable code. 
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Each member of the Ada team (11 total) 
was interviewed individually to gain some 
insight into the experiences he or she had 
during implementation. Team members were 
asked questions concerning ease or 
difficulty of implementing features, unit 
testing, integration, correcting errors, 
using tools, etc. Questions concentrated on 
an individual’s particular area of work, but 
general subjective questions were asked of 
the entire team. Observation of the 
development was accomplished by attending 
reviews and regular implementation meetings 
held by the team. These regular 
implementation meetings were actual working 
meetings in which team members discussed 
progress, solved implementation problems, 
clarified interfaces, shared knowledge, and 
planned implementation strategies. In 
addition, much information was gained 
through informal conversations with the team 
on Implementation progress. Information 
received through code analysis was actually 
collected two ways. First, the code was 
examined to tabulate such attributes as 
number of modules, number of lines of code, 
number of comments, etc. Second, another Ada 
team, in the process of Ada training, 
performed code reading on parts of the 
dynamics simulator code as a training 
exercise and they provided their comments on 
the code. 

The remainder of this paper will 
concentrate on some interesting comparisons 
between the FORTRAN and the Ada projects and 
some of the major lessons learned during the 
implementation phase of the Ada project. 

1. FORTRAN/Ada PROJECT COMPARISONS 

Several factors need to be considered 
when trying to directly compare metrics from 
the FORTRAN project and those from the Ada 
project. First, the FORTRAN project was 
considered to be the "real" operational 
version of the dynamics simulator being 
developed, and as such, it was necessary for 
that project to meet the schedules imposed 
by an impending launch date. The Ada team, 
on the other hand, was allowed a more 
relaxed schedule for development which 
included adequate training time, time to 
experiment with design methodologies, and 
finally, time to recode or enhance if 
"better" methods occurred to the developers. 
One result of this extra time was the 
development of a much more sophisticated 
user- interface for the Ada project. 


Second, this general type of dynamics 
simulator was a very well-known application 
for the FORTRAN team since similar 
simulators have been built repeatedly In 
this environment. Thus, the general design 
of the FORTRAN simulator was reused from 
previous designs and was known to be a very 
satisfactory design for the application. In 
addition to the design, much of the code was. 
reusable--about 36%. The Ada team developed 
a new design [1] which they felt was more 
suitable for Ada and which they felt more 
accurately represented the actual physical 
system they were trying to simulate. While 
this design may be a better physical 
representation of the problem, it did not 
have the advantage of previous use to refine 
and correct any possible problems. No Ada 
code was available for reuse but several 
FORTRAN routines were used by the Ada team. 
These comprised only about 2% of the code. 

Keeping in mind these differences in 
the actual projects, we will discuss some 
interesting FORTRAN/Ada comparisons. 

1.1 Size of Ada project is larger than 
FORTRAN project. 

As mentioned in the background section, 
a simple count of the number of lines of 
code, including every line of any type as a 
line, yields a count of 135,000 source lines 
of code for the Ada project and a count of 
45,500 source lines of code for the FORTRAN 
project. These figures are really a little 
misleading, since the Ada line count 
includes 23,000 lines of blank lines which 
are inserted for readability. Also, the Ada 
count includes 49,000 lines of comments 
compared to 19,500 lines of comments in the 
FORTRAN count. When the number of executable 
lines of code are compared, we find that the 
Ada project has 63,000 lines of executable 
code compared to 25,500 for the FORTRAN 
project. 

In these particular projects, there 
were other reasons why the Ada project was 
larger. As we mentioned earlier, the Ada 
project was not constrained by schedule 
pressure and so they developed a system with' 
more functional ity--a system with more of 
the "nice to have, but not required" 
features. Naturally this increased the size 
of the system. To some extent, the Ada 
language itself was a driving factor for the 
size difference, since it requires more code 
to write such constructs as package 
specifications, declarations, etc. In 
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addition, the Ada team used a style guide 
[3] that required certain constructs to be 
spread over several lines of code for 
readability. 

Another interesting way to compare the 
size of the two projects is to examine the 
size of the load modules for each one. This 
also shows the Ada system to be larger- 
occupying 2300 512-byte blocks, compared to 
953 512-byte blocks for the FORTRAN load 
module. 

1.2 Project cost is similar for the two 
implementations. 

One of the problems with trying to 
compute productivity is that there are 
many ways to compute it. Usually, in the 
Software Engineering Laboratory, the 
calculation is made by taking the total 
number of source lines of code developed and 
dividing by the number of hours spent on the 
project. The number of hours is carefully 
recorded on forms weekly and includes the 
hours spent on all phases of the project 
beginning with requirements analysis and 
ending with the completion of acceptance 
testing. In order to compare the FORTRAN and 
Ada projects, the calculations were made 
using the number of hours spent on each 
project from requirements analysis to the 
completion of system testing since 
acceptance testing has not yet been 
completed on the Ada system. As we see In 
figure 1, using the total number of source 
lines of code (SLOC) for each project, we 
get a productivity of 3.8 SLOC/hr. for the 


FORTRAN 


FORTRAN project and a productivity of 6.; 
SLOC/hr. for the Ada project. Rememberiw 
that the Ada code included many blank line: 
of code that were not included in th* 
FORTRAN line count, we recomputed the Ad<* 
figure, excluding the blank lines and got i 
productivity of 5.2 SLOC/hr. When W€ 
considered the effort requi red - just tc 
develop new lines of code and not the 
reusable code, the figures are 2.7 SLOC/hr. 
for FORTRAN and 6.1 SLOC/hr. for Ada with 
blanks and 5.0 SLOC/hr. without blanks. This 
would seem to imply that Ada is more 
productive, but we must remember that it 
took many more lines of code to develop the 
Ada system and that the style guide caused 
many Ada constructs to be spread over 
several lines. — — 

Let’s look at the figures when we 
consider only executable lines of code. 
Using only the number of lines of code which 
are executable, we got a productivity figure 
of 2.14 SLOC/hr. for the FORTRAN project and 
2.8 SLOC/hr. for the Ada project. When we 
considered that many of the Ada constructs 
use more than one line, we looked at the 
number of executable statements ' (or 
semicolons) in the Ada project and 
recomputed productivity. Similarly for the 
FORTRAN, we counted statements and their 
continuations as one executable statement. 
Now we get a productivity of 1.85 SLOC/hr. 
for the FORTRAN project and .96 SLOC/hr. for 
the Ada project. Looking at the number of 
executable new statements in the FORTRAN 
yields a figure of 1.2 SLOC/hr. compared to 
,95 SLOC/hr. for the Ada project. These 
calculations would make FORTRAN look more 
productive. 


Ada 


Lines of Code 
Used for Computation 

Productivity 

Lines of Code 
Used for Computation 

Productivity 

Total lines of code 

3.8 SLOC/hr 

Total lines of Code 

6.17 SLOC/hr 

Total lines of code 
excluding blanks 

3.8 SLOC/hr 

Total lines of code 
excluding blanks 

5.12 SLOC/hr 

Executable lines 
of code 

2.14 SLOC/hr 

Executable lines 
of code 

2.8 SLOC/hr 

New lines of code 

2,7 SLOC/hr 

New lines of code 

6.08 SLOC/hr 

New lines of code 
excluding blanks 

2.7 SLOC/hr 

New lines of code 
excluding blanks 

5.03 SLOC/hr 

Executable statements 

1.85 SLOC/hr 

Executable statements 

0.96 SLOC/hr 

Executable “new” 
statements 

1.2 SLOC/hr 

Executable “new” 
statements 

0.95 SLOC/hr 


Figure 1: Productivity Comparisons 
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Perhaps a better way of viewing the 
productivity problem is to examine it from 
the standpoint of cost to produce the 
product. The total cost of the FORTRAN 
project from requirements analysis through 
acceptance testing was about 8.5 man-years 
of effort. The Ada project cost, using 
actual figures from requirements analysis 
through system testing and estimating the 
acceptance testing cost, is around 12 man- 
years of effort. When we take into 
consideration the percentage of reused code 
in the FORTRAN project and assume all the 
code generated was new code, it would have 
taken about 11.5 man-years of effort to 
develop the FORTRAN system. This makes the 
cost of developing the two systems roughly 
the same, especially when we consider that 
the Ada project was a "first-time" project 
and that the Ada project had slightly more 
functionality than the FORTRAN. 

1.3 Error types found in both projects 
show similar profiles. 

Detailed information was kept on the 
types of errors found in both projects and 
based on 104 forms collected for the FORTRAN 
project and 174 forms collected for the Ada 
project, the error types show a similar 
profile. Figure 2 shows the distribution of 
error types for each project. 


Error Type® 

FORTRAN* 

% 

Ada* 

% 

Computational 

12 

9 

Initialization 

15 

16 

Data Value or 
Structure 

24 

28 

Logic/Control 

Structure 

16 

19 

Internal Interface 

29 

22 

External Interface 

4 

6 

* There may be more than one error reported on 
*104 forms 
c 174 forms 

Figure 2: Error Profile 


An example of a computational error 
might be an error in a mathematical 
expression. An error like using the wrong 
variable would have been classified as data 
value or structure error. Internal Interface 
errors refer to errors in module to module 
communication, while external interface 
errors refer to errors in module to external 
communications. 

Perhaps one result here that is 
suprising is that the team expected to have 
fewer Internal interface errors with Ada, 
but the percentage is not significantly 
different from the FORTRAN. When the 
detailed information on the Ada errors was 
examined, we learned that many of the errors 
classified as internal interface errors were 
caused by a type change of some sort. For 
example, a variable may have been classified 
as one type in one portion of the code and a 
different type in another, or the original 
type chosen for a variable might not have 
been suitable. Another common reason that 
internal interfaces were changed was that a 
new function was added to the module which 
required an interface change. Also, in some 
cases, a developer would find he needed 
another variable from some other module 
which he did not originally think he needed. 

1.4 The percentage of "very easy to 
find* errors was less in the Ada project 
than the FORTRAN project. 

Detailed information was captured on 
the effort required to isolate errors .The 
error levels were categorized a) very easy 
or less than one hour b) easy or one hour to 
one day c) hard or one to three days 
d) very hard or more than three days. The 
FORTRAN team found that 81% of their errors 
were in the "very easy" to isolate category. 
In comparison, the Ada team found only 59% 
of their errors in that category. There are 
several possible explanations for this. 
First, many of the errors found by the 
FORTRAN team were types of errors which 
would have been identified by a mon, 
rigorous compiler such as the Ada compiler 
Throughout the project, the Ada team fel*. 
that the compiler was one of the most usefu 
tools because it was able to pinpoint many 
errors at the early stage of compilation. 
Another possible explanation for the 
difference in effort to locate errors is the 
difference In experience of the teams with 
the language. The Ada team was not 
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experienced in Ada. and did not feel they had 
the same intuition as the FORTRAN team did 
to aid in isolating errors. 

2 . MAJOR LESSONS LEARNED DURING 
IMPLEMENTATION OF THE Ada PROJECT 

jjfcf? r-. ; . ; - V ?i rl 3 ' • 

2,1 A flat structure usually has more 
advantages than a nested structure. Thus, 
nesting should be used sparingly. 

The object oriented design used by the 
team [9] seemed to promote a nested 
structure for information hiding purposes. 
While the nesting was not explicitly 
specified in the design, it seemed to be a 
natural manifestation of the object oriented 
des ign--so the parts of. an object or a 
package would be included inside that 
package instead of being called in from the 
outside. The team felt that they were 
Implementing nesting conservatively, and 
indeed, one view of the system shows that it 
has 124 packages of which 55 are library 
units. However, the nesting in the system 
was extensi ve--many levels deep in some 
places. 

This amount of nesting caused many 
problems for the Ada developers. First, 
nesting increased the amount of 
recompilation necessary during 
implementation and testing. Many more units 
had to be recompiled when changes were made 
to the system since Ada assumes dependencies 
between nested objects or procedures even 
when there are none. Since compilation is a 
lengthy process, this slowed down the 
development process. Much unneccessary 
recompilation could have been avoided by the 
use of more library units. 

Second, nesting increased the difficulty 
of unit testing. In fact, the greater the 
level of nesting, the more difficult the 
unit testing was. The lower level units were 
not in the scope of the test driver, and a 
debugger was necessary to "see" into these 
lower level units. For the purposes of unit 
testing in FORTRAN, a unit is defined as a 
subprogram. When this same definition was 
applied to the Ada, unit testing 
difficulties arose since many of these units 
could not be tested in isolation. Instead, 
It was necessary to integrate units which 
fit logically together, usually Integrating 
up to the package level, before testing was 
done. Nesting also Increased the difficulty 


of tracing problems since it is hard t 
identify the calling module of a neTte 
unit. 

2.2 A high degree of nesting was foun 
to be an Impediment for reuse. 

Perhaps the major advantage of usin 
library units instead of nested units 1 
that their use increases the potential o 
reusability. When nesting is used, the si 2 
of the compilation units, the componen 
sizes and the file sizes all tend to b 
larger. Thus when these larger units ar 
examined for potential reuse, it 1$ mucl 
more likely that only a portion of the larg* 
unit will actually have the code whic! 
performs the needed function for the ne\ 
system. Then it becomes necessary to unnesi 
the code before reuse is possible. Thii 
unnesting is very labor intensive. 

Another similar Ada project presently 
under development in the SEl has examinee 
this project’s code for reuse and has founc 
that it could use as much as 40% of the 
original code. However, it^was necessary tc 
unnest all of this code before reuse. This 
use of library units would have enabled the 
second project to reuse the code directly. 

2.3 "Call -through" units are not an 
efficient way to implement an object- 
oriented design. 

"Call -throughs* are procedures whose 
only function is to call another routine. 
These were used to group appropriate modules 
exactly as they were represented in the 
design so that a physical module of code was 
created for every object in the design. 
Thus, when objects were nested inside 
objects, a "call -through" was used to get to 
the inner object. Implementation of "call- 
through" units could be accomplished using 
either nested or library units. This 
practice resulted in additional code which 
Increased the system size and testing 
complexity. This unneccessary code could 
have been eliminated if some of the objects 
in the design were left as logical objects, 
rather than coding every object in the 
design to preserve the exact design 
structure. 
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2.4 An abstract data type analysis 
should be Incorporated into the design 
process to control types. 

Since the Ada team was not previously 
experienced in Ada, it took time to get 
accustomed to the strong typing of Ada. The 
tendency was to create too many types. A 
type would be created with a strict range 
for a particular portion of the application. 
Then other areas of the application would 
need a similar type, but the original one 
would be too restrictive. So another type 
was created, along with a corresponding set 
of operations. Some of the difficulty with 
this method of typing began to emerge during 
critical design,- where interface problems 
developed due to typing differences. 

Multiple types also increased the 
difficulty of testing modules. Test drivers 
needed to be larger to handle multiple types 
and were often coded as large "case" 
statements in order to provide a testing 
capability for each type. 

A recommendation for future Ada 
developments is to incorporate an abstract 
data type analysis into the design process 
to control the generation of types. A more 
general new type would be defined, then many 
subtypes of that type could be used in 
various sections of the application. This 
type analysis would provide the following 
advantages: 1) operations would be reused, 
2) there would be fewer main types to 
manage, and 3) families of types would be 
developed that would inherit properties from 
each other. 

SUMMARY 

In spite of a lack of experience in Ada 
at the beginning of the project, the Ada 
team was able to develop a very suitable 
dynamics simulator in Ada which meets the 
requirements originally developed for the 
FORTRAN development effort. The overall cost 
of the projects appears to be similar and 
early indications of reuse potential in the 
Ada project are very encouraging. Most of 
the problems encountered by the Ada team are 
surmountable. Many are either caused by a 
lack of experience with Ada or an immaturity 
of the tools. Both of these problems will be 
resolved in time. 

There are still many unanswered 
questions to be considered on this project- - 
for example, nothing at all has been 


mentioned about maintainability, reliabilll 
or performance. It Is still too early t 
look at these results on this project, but 
research efforts are continuing on thi 
project and several other Ada project in th 
SEL. Hopefully, these efforts will provid 
even more answers about the use Ada in th 
future. 
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Abstract 

The effective use of Ada requires the adoption of 
modern software-engineering techniques such as 
object-oriented methodologies. A Goddard Space 
Flight Center Software Engineering Laboratory Ada 
pilot project has provided an opportunity for studying 
object-oriented design in Ada. The project involves 
the development of a simulation system in Ada in 
parallel with a similar FORTRAN development. As 
part of the project, the Ada development team 
trained and evaluated object-oriented and process- 
oriented design methodologies for Ada. 

In object-oriented software engineering, the software 
developer attempts to model entities in the problem 
domain and how they interact. Most previous work 
on object-oriented methods has concentrated on using 
object-oriented ideas in software design and 
implementation. However, we have also found that 
object-oriented concepts can be used advantageously 
throughout the entire Ada software life-cycle. This 
paper provides a distillation of our experiences with 
object-oriented software development. It considers 
the use of entity-relationship and process/data-flow 
techniques for an object-oriented specification which 
leads smoothly into our design and implementation 
methods, as well as an object-oriented approach to 
reusability in Ada. 


1. Introduction 

Increased productivity and reliability from using Ada 
must come from innovative application of the non- 
traditional features of the language. However, past 
experience has shown that traditional development 
methodologies result in Ada systems that "look like a 
FORTRAN design" (see, for example, [Basili 85]). 
Object-oriented techniques provide an alternative 


approach to effective use of Ada. As the name 
indicates, the primary modules of an object-oriented 
design are objects rather than traditional functional 
procedures. Whereas a procedure models an action, 
an object models some entity in the problem domain, 
encapsulating both data about that entity and 
operations on that data. Ada is especially suited to 
this type of design because its package facility 
directly supports the construction of objects. 

The Goddard Space Flight Center Software 
Engineering Laboratory is currently involved in an 
Ada pilot project to develop a system of about 60,000 
lines (20,000 statements) [Nelson 86, McGarry 88]. 
This project has provided an opportunity to explore 
object-oriented software development methods for 
Ada. The pilot system, known as "GRODY*, is an 
attitude dynamics simulator for the Gamma Ray 
Observatory (GRO) spacecraft and is based on the 
same requirements as a FORTRAN system being 
developed in parallel. 

The GRODY team was initially trained both in the 
Ada language and in Ada-oriented design 
methodologies. The team specifically studied the 
methodology promoted by Grady Booch [Booch 83] 
and the PAMELA™ methodology of George Cherry 
[Cherry 85]. Following this, during a training 
exercise, the team also began synthesizing a more 
general approach to object-oriented design. At an 
early stage of the GRODY development effort, the 
team produced high-level designs for GRODY using 
each of these methodologies. Section 2 summarizes 
the comparison of methodologies made by the 
GRODY team. 


PAMELA is a registered trademark of George W. Cherry. 
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Unfortunately, the system requirements given to our 
team were highly biased by past FORTRAN designs 
and implementations of similar systems. Therefore 
we began by recasting the requirements in a more 
language-independent way using the "Composite 
Specification Model" [Agresti 84, Agresti 87]. This 
method involves the use of state transition and entity- 
relationship techniques as well as more traditional 
data flow diagrams. We then designed the system to 
meet this specification, using object-oriented 
principles. The resulting design is, we believe, an 
improvement over the previous FORTRAN designs 
[Agresti 86]. The system is currently in final system 
testing. 

Previous work by the present authors has 
concentrated on using object-oriented ideas in 
software design and implemen tati on. This work 
resulted in a design method which synthesizes the best 
methods studied during the GRODY project 
[Seidewitz 86a, Seidewitz 86b]. However, we have 
found that object-oriented concepts can be used 
advantageously throughout the entire Ada software 
life-cycle [Stark 87]. Section 3 provides a distillation 
of our experience with GRODY and other Ada 
projects into an evolving life-cycle methodology. 


2. Comparison of Methodologies 

This section presents a comparison of design 
approaches to the GRO dynamics simulator, including 
the traditional functional approach jased for the 
FORTRAN version, the Booch methodology, 
PAMELA and the general methodology developed by 
the team itself. It should be noted that the GRODY 
team was t rain ed in the Bo och and P AME LA 
methodologies in early 1985. Since then, both 
methodologies have evolved considerably, in many 
cases addressing in different ways the very issues that 
led us to develop our methodology. Nevertheless, as 
background motivation for the direction taken by the 
GRODY team, the comparison in this section is in 
terms of the 1985 versions of the methodologies. 

2.1 Functional Design 

The design of the FORTRAN version of the 
simulator is functionally-oriented. This design has a 
strong heritage in previous simulator and ground 
support systems. It consists of three major subsystems 
which interact as shown in figure 1. The "TRUTH 
MODEL" subsystem includes models of the spacecraft 


hardware, the external environment and the attitude 
dynamics; that is, the "real world" as opposed to the 
spacecraft control system. The SIMULATION 
CONTROL subsystem alternatively activates the 
SPACECRAFT CONTROL and TRUTH MODEL 
subsystems in a cyclic fashion. Each subsystem 
consists of a single driver subroutine which calls on a 
hierarchy of lower-level subroutines to perform the 
functions of the subsystem when activated by 
SIMULATION CONTROL. Data flow between 
subsystems, as well as system parameterization, is 
entirely though a set of global COMMON areas. 


i i 

SIMULATION 

CONTROL 



FIGURE 1 FORTRAN Simulator Functional Desijpi 

The strengths of this functional design lay in its 
relatively simple structure and direct implementation 
in FORTRAN. However, its main drawback is the 
complete lack of encapsulation of global data. The 
only restrictions on which code may access which 
global data are enforced by programmer discipline. 
This can lead, intentionally or not, to illicit 
corruption of global data by code in one part of the 
system which is unexpected by another part of the 
system. Further, most simulation parameters are 
hard-coded into the global common area, making the 
user interface for the system hard to modify and 
impossible to generalize. 
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2.2 Booch’ s Methodology 

Grady Booch is, perhaps, the most influential 
advocate of object-oriented design in the Ada 
community [Booch 86b, Booch 87]. As learned by the 
GRGDY team, Booch’s methodology derives a design 
from a textual specification or informal design 
[Booch 83], an approach adopted from Abbott 
[Abbott 83]. The technique is to underline all the 
nouns and verbs in the specification. The objects in 
the design derive from the nouns; object operations 
derive from the verbs. Obviously, some judgment 
must be used to disregard irrelevant nouns and verbs 
and to translate the remaining concepts into design 
objects. Once the objects have been identified, the 
design can then be represented diagrammatically 
using a notation which shows the dependencies 
between Ada packages and tasks which implement the 
objects. Figure 2 shows such a diagrammatic top- 
level design for GRODY. 



FIGURE 2 Object-Oriented Simulator Design 
(Booch Methodology) 

The Booch design methodology contains all the basic 
framework of the object-oriented approach. 
However, application of this methodology to GRODY 
indicated that it was not readily applicable to sizable 
systems. The team found the graphical notation clear 
but not detailed or rigorous enough. Further, Booch 
gives no explicit method for diagramming a 
hierarchical decomposition of objects, which is 
needed for any sizable system. Booch’s notation does 
not, therefore, seem to be a complete design notation. 


Note, however, that in more recent work Booch has 
extended the scope of the notation to address some of 
these shortcomings [Booch 87]. 

A second difficulty of Booch’s methodology is in the 
technique for deriving the design from the 
specification text. This works well when the 
specification can be written concisely in a few 
paragraphs. However, when the system requirements 
are large, as with GRODY, this can be difficult. In 
addition, any attempt to use such a technique directly 
on a requirements document such as ours is doomed 
to failure due to the sheer size and complexity of the 
document. Realizing such drawbacks, Booch no 
longer advocates the use of this textual method, 
which was never actually intended for large systems 
development [Booch 86b]. Instead, he derives an 
object-oriented design from a data flow diagram 
based specification [Booch 86a, Booch 87]. However, 
from the published examples it is still unclear how to 
systematically apply this method to realistic systems. 

2.3 PAMELA 

The second methodology considered by the GRODY 
team was the Process Abstraction Method for 
Embedded Large Applications (PAMELA) developed 
by George Cherry [Cherry 85, Cherry 86]. PAMELA 
is oriented toward real-time and embedded systems. 
PAMELA is process- oriented, so a PAMELA design 
consists of a set of interacting concurrent processes. 
A well designed process is effectively a concurrent 
object, thus PAMELA is object-oriented in a general 
way. 

PAMELA uses a powerful graphical notation without 
many of the drawbacks found in Booch’s notation 
[Cherry 86]. During the PAMELA design processes, 
the designer successively decomposes processes into 
concurrent subprocesses until he reaches the level of 
primitive single-thread processes. The GRODY team 
found that PAMELA provides fairly explicit 
heuristics for constructing good processes. The 
designer uses these hints to construct the top-level 
processes from the system specification. The designer 
then recursively decomposes each non-primitive 
processes until only primitive processes remain. The 
primitive processes can then be coded as Ada tasks 
with a single thread of control. Non-primitive 
processes are simply packages of lower level processes 
and thus contain multiple threads of control. Figure 
3 shows the top levels of a PAMELA design for 
GRODY. 
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FIGURE 3 PAMELA Simulator Design 

PAMELA’S heuristics can be very effective when 
designing a real-time system that is heavily driven by 
external asynchronous actions. In other cases, 
however, they require considerable interpretation to 
be applicable. Although parts of GRODY might 
conceptually be concurrent (because GRODY 
simulates actions that happen in parallel in the real 
world), there is no requirement for concurrency in 
the simulation of these actions because GRODY does 
not have to interface with any active external entity 
(except the user). In addition, since GRODY runs on 
a sequential machine, the overhead of Ada tasking 
and rendezvous could greatly degrade the time 
performance of the system. Thus, one interpretation 
of PAMELA’S principles might leave very large 
sections of GRODY as primitive single-thread 
processes, with only a few concurrent objects in the 
entire design. To proceed further in the 
decomposition, the designer has to rely more on 
intuition about what makes a good object and rely 
less on the methodology. 

In fact, at the time that the GRODY team was using 
PAMELA, it provided no support for the 
decomposition and design of anything below the level 
of the primitive process, an Ada task [Cherry 85]. 
Since then. Cherry has added several concepts to the 
methodology, including the use of abstract data types 
[Cherry 86]. . Recently he has introduced a major 


update of PAMELA known as "PAMELA 2" which is 
now explicitly object-oriented [Cherry 88]. In fact, 
PAMELA now stands for "Pictorial Ada Method for 
Every Large Application." It is still to early, 
however, to evaluate the generality of PAMELA 2 as 
an object-oriented methodology. 

2.3 General Object-Oriented Development 

As a result of the above experiences, the GRODY 
team developed its own object-oriented methodology 
which attempts to capture the best points of the 
object-oriented approaches studied by the team as 
well as traditional structured methodologies 
[Seidewitz 86a, Seidewitz 86b, Stark 87]. It is 
designed to be quite general, giving the designer the 
flexibility to explore design alternatives easily. It is 
also based on principles that guide the designer in 
constructing good object-oriented designs. This 
methodology was used to develop the complete 
detailed design for GRODY. 

This general object-oriented development ("GOOD") • 
methodology is based on general principles of 
abstraction, information hiding and design hierarchy 
discussed in the next section. These principles are 
less explicit than Booch’s methodology or PAMELA, 
but they do provide a firm paradigm for generating 
and evaluating an object-oriented design. Indeed, as 
menti oned a bove, the team round the Booch and 
PAMELA design construction techniques restrictive, 
often necessitating the designer to rely on intuition 
for object-oriented design. The GOOD methodology 
is an attempt to codify this intuition into a basic set 
of principles that provide guidance while leaving the 
designer the flexibility to explore various design 
approaches. 

In addition, we have also considered, independently 
of Booch, the transition from structured analysis 
[DeMarco 79] to object-oriented design in the context 
of the GOOD methodology, developing a technique 
known as abstraction analysis [Seidewitz 86a, 
Seidewitz 86b]. This technique is analogous to 
transform and transaction analysis used in structured 
design [Yourdon 78]. However, proceeding into 
object-oriented design from a structured analysis, by 
whatever means, requires an "extraction" of problem 
domain entities from traditional data flow diagrams. 
From an object-oriented viewpoint, it seems 
appropriate to instead begin a specification effort by 
identifying the entities in a problem domain and their 
interrelationships. Study is continuing on including 
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such object-oriented system specification techniques 
in the GOOD methodology and on applying object- 
oriented principles throughout the Ada life cycle 
[Stark 87]. Section 3 will discuss this in more detail. 

Figure 4 shows the actual design of the main part of 
GRODY. The object diagram notation 
[Seidewitz 86b] used in figure 4 shows the 
dependencies between the various objects which make 
up a system design, in a manner similar to Booch’s 
diagrams. However, the object diagram notation also 
explicitly includes the idea of leveled composition of 
objects, like the PAMELA process graph notation. 
Moreover, as will be discussed in 1 more detail in 
section 3, the designer may use object diagrams to 
express the design from the highest levels all the way 
down to the procedural level. (This capability has 
also been added to PAMELA 2 [Cherry 88].) 

Since GRODY was derived from the same basic 
requirements as the FORTRAN design, there are 
similarities in the designs of the two systems. 
However, there are also some fundamental differences 
in the GRODY design that can be traced to the 
object-oriented methodology. For example, in 
GRODY the TRUTH MODEL is effectively passive, 
with the SPACECRAFT CONTROL calling on 
operations as needed to obtain sensor data and 
activate actuators. All sensor and command data is 
passed using these operations. This design approach 
was encouraged by viewing the TRUTH MODEL as 
an object with multiple operations rather than as a 
functional subsystem with a single driver. 

The simulation timing of GRODY is also different 
from the FORTRAN design. The object-oriented 
methodology led to consideration of a "TIMER* 
object in GRODY which provides an abstraction of 
the simulation time. This utility object provides a 
common time reference for the SPACECRAFT 
CONTROL and TRUTH MODEL separate from the 
SIMULATION CONTROL loop. Unlike the 
FORTRAN design, in GRODY the "cycle times" of 
the SPACECRAFT CONTROL and TRUTH MODEL 
are not the same. The GRODY team chose to 
faithfully model, in the SPACECRAFT CONTROL 
abstraction, the timing of the actual spacecraft control 
software, which is not under user control. However, 
GRODY allows the simulation user to set the cycle 
time for the TRUTH MODEL over a fairly wide 
range, to allow the user to trade-off speed and 
accuracy as desired. 


SIMULATION 

CONTROL 



FIGURE 4 Object-Oriented Simulator Design 
(GOOD Methodology) 

Finally, the PARAMETER DATABASE and 
GROUND COMMAND DATABASE objects 
encapsulate user settable parameters for the 
simulation. Similar data is contained in COMMON 
blocks in the FORTRAN design. This encapsulation 
of "global" data is typical of object-oriented designs. 
It provides both increased protection of the data 
encapsulated and increased opportunity for reuse. For 
example, the simulation parameters in the FORTRAN 
design are COMMON block parameters which must 
be hard-coded into the user interface code. (For 
simplicity the user interface modules have ' not been 
included in figure 4.) In the GRODY design, 
simulation parameters are identified by enumeration 
constants, which allows the user interface displays to 
be parameterized by external data files. This should 
greatly increase the reusability of the user interface. 

The differences discussed above could probably have 
been incorporated into the FORTRAN design. 
However, it was largely the influence of the object- 
oriented approach which lead to their consideration 
for GRODY when they had not been considered in 
several previous designs of simulators for FORTRAN. 
Considerations of encapsulation and reusability 
indicate that the GRODY design may be "better" than 
the FORTRAN design. This is, of course, the goal of 
object-oriented methods. However, the true test of 
the merits of the GRODY design will only come from 
continuing studies of the comparative maintainability 
of the FORTRAN and Ada simulators. 
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In terms of the methodology itself, the team found 
the object diagram notation extremely useful for 
discussing the design during development Further, 
the notation provided complete documentation of the 
design and was tailored specifically towards Ada. This 
made the transition to coding very smooth, and 
allowed the documentation to be readily updated as 
coding proceeded. By the end of coding, there were 
no major changes in the design and most changes that 
did occur were additions rather than alterations. 

The object diagram notation evolved considerably 
during the GRODY project in response to continuing 
experience with its use. The lack of a specific 
methodology at the start of the GRODY project was a 
problem for the team, as was the continuing evolution 
of the methodology over the duration of the project. 
Further, the fact that managers were not familiar 
with the new methodology made the use of object 
diagrams difficult at reviews. Another problem was 
that the detail of the object diagrams and the 
emphasis on keeping the documentation up-to-date 
required a great deal of effort to maintain a rather 
large design notebook. The team clearly saw the great 
need for automated tools to support the methodology 
in this area. Consideration has also been given to 
extend the object diagram notation to better cover 
such topics as generics, abstract data types and large 
system components. 


3. The GOOD Methodology 

Section 2 described the background motivation of the 
GRODY team in developing the GOOD methodology 
and applying it to the full GRODY design. The 
experience with the Composite Specification Model 
and object-oriented design on GRODY, as well as 
experience on other Ada projects, has led to the 
continuing evolution of a comprehensive, integrated, 
object-oriented approach to software development, 
encompassing all phases of the software life cycle. 
This section provides an overview of the current 
GOOD life cycle approach. 

3.1 Entities and Relationships 

The modules of an object-oriented design are 
intended to primarily represent problem domain 
entities . From an object-oriented viewpoint, it seems 
appropriate to begin a software specification effort by 
identifying the entities in a problem domain and their 
interrelationships. Entity-relationships and data flow 


techniques can then complement each other, the 
former delineating the static structure problem 
domain and the latter defining the dynamic function 
of a system. This is similar to the "contextual" and 
"functional" views of the Composite Specification 
Model [Agresti 84, Agresti 87]. A close relation to the 
specification app roac h discussed here is described in 
some detail in [Bailin 88], 

An entity is some individual item of interest in the 
problem domain. For example, consider the 
specification- of GRODY. Several problem domain 
entities immediately come to mind: the spacecraft 
structure, sensors and thrusters on the spacecraft, the 
environment, etc. An entity is described in terms of 
the relationships into which it enters other objects. A 
spacecraft might be in a certain orientation, have 
certain thrusters, etc. Entities can also have 
attributes , such as spacecraft mass, which are data 
items describing the intrinsic properties of the entity. 

To model the structure of the problem domain 
requires the identification of entity types which are 
groups of entities with the same types of attributes 
and relationships. For example, we may define a 
SPACECRAFT STRUCTURE entity type with 
SPACECRAFT MASS and DRAG COEFFICIENT 
attributes. All SPACECRAFT STRUCTURE entities 
have these attributes, but different individual entities 
have different specific values for the attributes. 

A problem domain model must also include a 
specification of all possible relationships between 
various types of entities. These relationships may 
themselves have attributes and enter into other 
relationships. For example, the ATTITUDE STATE 
of a spacecraft describes its current orientation 
relative to inertial space and its current rotational 
motion. The ATTITUDE STATE is effectively a 
relationship between the spacecraft, the environment 
and the effect of any thruster firings used to reorient 
the spacecraft. This relationship has such attributes 
as the current spacecraft orientation and the 
spacecraft angular rotation rates. 

The entity-relationship diagram (ERD) is a common 
graphical tool for entity-oriented specification 
[Chen 76]. Figure 5 shows an ERD for the GRODY 
problem domain. The notation for this diagram is 
based on [Ward 85]. Complex relationships such as 
ATTITUDE STATE are shown as associative entities 
on ERDs such as figure 5. Associative entities can be 
identified on an ERD by being . connected to a 
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relationship symbol by an arrow. Associative entities 
are "objectivizations* of relationships which may have 
attributes and enter into other relationships. 
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FIGURE 5 Attitude Dynamics Entity-Relationship 
Diagram 

Figure 5 shows only a small part of the example 
problem domain. It would grow as additional entities 
and relationships are added to describe additional 
parts of the problem domain. As the specification 
grows, a complete ERD can quickly become 
cumbersome. It is possible to "level" ERDS showing 
complex entities on high-level diagrams which enter 
into composite relationships. These are then broken 
down in lower-level diagrams. An extended data 
dictionary notation is also useful as a textual 
representation of entity type and relationship 
definitions. In addition, the data dictionary provides 
a common basis for data definition between the static 
and the dynamic views of the problem domain. 

3.2 Processes and Data Flow 

ERDs show all possible relationships between 
different types of entities. They do not show the 
actual relationships between specific entities at 
specific points in time, nor how these actual 
relationships change over time. Data flow techniques. 


however, provide exactly this dynamic view. 
Traditional data flow diagrams (DFDs) show the flow 
of data between functional transformations. We will, 
instead, diagram the flow of data between processes 
which represent the dynamic view of one or, more 
entities in the problem domain. A process is 
effectively a state machine which accepts input 
stimuli, reacts to it and produces output stimuli, 
possibly modifying some internal state data. It has no 
"operations" as such, only stimuli and responses. 
These stimuli may be either in the form of data flow 
or pure control signals . 

To construct a dynamics data flow model, one needs 
to identify those active entities which have associated 
processes. For each relationship in the static entity- 
relationship model, we choose one of the related 
entity types to be active. This entity type has an 
associated process which is charged with maintaining 
the state of the relationship in response to internal 
and external stimuli. Note that an entity type may be 
active relative to one relationship and passive relative 
to another, and that associative entities may be active 
or passive. 

For example, consider a simplified attitude dynamics 
simulation system similar to GRODY. The attitude of 
a spacecraft is its orientation relative to inertial space, 
and an attitude dynamics simulator models the 
rotational motion of the spacecraft in response to 
external disturbances and the spacecraft control 
system. Figure 5 describes the problem domain for 
such a system. The active entities in this case interact 
in a control loop outlined in figure 6. All the 
processes shown on figure 6 are associated with active 
entities on figure 5. A data item flowing on a 
diagram such as figure 6 may be a passive entity, an 
attribute or any other composite data item or data 
element defined in the data dictionary. 

The dynamic model must also provide a specification 
for each individual process. This specification should 
include a textual description of the object as well as a 
listing of all inputs and outputs. The process 
specification also provides a place to include "non- 
functional" requirements such as timing and accuracy 
constraints. However, the main point of a process 
specification is to detail the function of the process. 
This can be in the form of structured English, a state 
transition diagram or some other appropriate notation, 
such as differential equations for the time evolution 
of the spacecraft attitude. 
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FIGURE 6 Attitude Dynamics Data Flow Diagram 

The function of a process can also be given by a 
lower-level data flow diagram. Decomposition can 
continue recursively on all diagrams until all processes 
have been decomposed into primitive functions and 
states. This results in a leveling similar to the 
leveling of traditional DFDs. However, unlike DFDs, 
each object at each level of a process-data-flow 
diagram specification has a complete process 
specification. Each process must also be associated a 
reasonable problem domain entity independently of 
its decomposition. 

3.3 Object Identification 

The intent of an object is to represent a problem 
domain entity and any associated process. The 
concept of abstraction deals with how an object 
presents this representation to other objects 
[Booch 86b, Dijkstra 68]. Ideally, the objects in a 
design should directly reflect the problem domain 
entities identified during system specification. 
However, various design considerations may require 
splitting or grouping of objects and there will almost 
always be additional objects in a design to handle 
"executive" and "utility” functions. Thus there is a 
spectrum of levels of abstraction of objects in a 
design, from objects which closely model problem 
domain entities to objects which really have no reason 
for existence [Seidewitz 86b]. The following are some 
points in this scale, from strongest to weakest: 


Entity Abstraction - An object represents a useful 
model of a problem domain entity or class of entities. 

Action Abstraction - An object provides a 
generalized set of operations which all perform 
similar or related functions (this is similar to the idea 
of a "utility" object in [Booch 87]). 

Subsystem Abstraction - An object groups together a 
set of objects and operations which are all related to a 
specific part of a larger system (this is similar to the 
"subsystem" concept in [Booch 87]). 

The stronger the abstraction of an object, the more 
details are suppressed by the abstract concept. The 
principle of information hiding states that such details 
should be kept secret from other objects [Booch 87, 
Parnas 79], so as to better preserve the abstraction 
modeled by the object. 

3 A Design Hierarchies 

The principles of abstraction and information hiding 
provide the main guides for creating "good" objects. 
These objects must then be connected together to 
form an object-oriented design. This design is 
represented using the graphical object diagram 
notation [Seidewitz 86b]. 

The construction of an object-diagram-based design 
is mediated by consideration of two orthogonal 
hierarchies in software system designs [Rajlich 85]. 
The composition hierarchy deals with the composition 
of larger objects from smaller component objects. The 
seniority hierarchy deals with the organization of a set 
of objects into "layers". Each layer defines a virtual 
language extension which provides services to senior 
layers [Dijkstra 68]. A major strength of object 
diagrams is that they can distinctly represent these 
hierarchies. 

The composition hierarchy is directly expressed by 
leveling object diagrams (see figure 7). At its top 
level, any complete system may be represented by a 
single object which interacts with external objects. 
Beginning at this system level, each object can then 
be refined into component objects on a lower-level 
object diagram, designed to meet the specification for 
the object. The result is a leveled set of object 
diagrams which completely describe the structure of a 
system. At the lowest level, objects are completely 
decomposed into primitive objects such as procedures, 
tasks and internal state data stores. At higher levels. 
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object diagram leveling can be used in a manner 
similar to Booch’s "subsystems" [Booch 87]. 



FIGURE 7 Composition Hierarchy 

The seniority hierarchy is expressed by the topology 
of connections on a single object diagram (see figure 
8). An arrow between objects indicates that one 
object calls one or more of the operations provided by 
another object. Any layer in a seniority hierarchy 
can call on any operation in junior layers, but never 
any operation in a senior layer. Thus, all cyclic 
relationships between objects must be contained 
within a virtual language layer. Object diagrams are 
drawn with the seniority hierarchy shown vertically. 
Each senior object can be designed as if the 
operations provided by junior layers were "primitive 
operations" in an extended language. Each virtual 
language layer will generally contain several objects, 
each designed according to the principles of 
abstraction and information hiding. 

3.5 System Design 

The main advantage of a seniority hierarchy is that it 
reduces the coupling between objects. This is because 
all objects in one virtual language layer need to know 
nothing about senior layers. Further, the 
centralization of the procedural and data flow control 
in senior objects can make a system easier to 
understand and modify. 



FIGURE 8 Seniority Hierarchy 


However, this very centralization can cause a messy 
bottleneck. In such cases, the complexity of senior 
levels can be traded off against the coupling of junior 
levels. The important point is that the strength of the 
seniority hierarchy in a design can be chosen from a 
spectrum of possibilities, with the best design 
generally lying between the extremes. This gives the 
designer great power and flexibility in adapting 
system designs to specific applications. 

Figure 9 shows one possible preliminary design for 
the ATTITUDE SIMULATOR. For simplicity, the 
sensors and thrusters are represented by a single 
"SPACECRAFT HARDWARE" object in figure 9. 
Note that, by convention, the arrow labeled ’’RUN" is 
the initial invocation of the entire system. In 
preliminary design diagrams such as figure 4, it is 
sometimes convenient to show what data flows along 
certain control arrows, much in the manner of 
structure charts [Yourdon 78] or n Buhr charts" 
[Buhr 84]. These annotations will not appear on the 
final object diagrams. 

In figure 9, the junior level components do not 
interact directly. All data flow between junior level 
objects must pass through the senior object, though 
each object still receives and produces all necessary 
data (for simplicity not all data flow is shown in 
figure 9). This design is somewhat like an object- 
oriented version of the structured designs of Yourdon 
and Constantine [Yourdon 78]. 
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FIGURE 9 Centralized Design 

We can remove the data flow control from the senior 
object and let the junior objects pass data directly 
between themselves, using operations within the 
virtual language layer (see figure 10). The senior 
object has been reduced to simply activating various 
operations in the virtual machine layer, with very 
little data flow. 


RUN 

▼ 

( ) RESULTS 
SIMULATION O » 

CONTROL 



FIGURE 10 Design with Decentralized Data Flow 


We can even remove the senior object completely by 
distributing control among the junior level objects 
(see figure 11). The splitting of the RUN control 
arrow in figure 11 means that the three objects are 
activated simultaneously and that they run 
concurrently . The seniority hierarchy has collapsed, 
leaving a homologous or non-hierarchical design 
[Yourdon 78] (no seniority hierarchy, that is; the 
composition hierarchy still remains). 

A design w hich is dece ntralized like figure 11 at all 
composition levels is very simitar to what would be 
produced by the PAMELA methodology [Cherry 86]. 
In fact, it should be possible to apply PAMELA 
design criteria to the upper levels of an object 
diagram based design of a highly concurrent system. 
AH concurrent objects would then be composed, at a 
certain level, of objects representing certain process 
"idioms” (Cherry ff6]. Below this level concurrency 
would generally no longer be advantageous. 


RUN 



FIGURE 11 Decentralized Design 

To complete the design, we need to add a virtual 
language layer of utility objects which preserve the 
level of abstraction of the problem domain entities. In 
the case of the ATTITUDE SIMULATOR these 
objects might include VECTOR, MATRIX, 
GROUND COMMAND and simulation 
PARAMETER types. Figure 12 shows how these 
objects might be added to the simulator design of 
Figure 10. 
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Figure 12 gives one complete level of the design of 
the ATTITUDE SIMULATOR. Note that figure 12 
does not include the data flow arrows used in earlier 
figures. When there are several control paths on a 
complicated object diagram, it rapidly becomes 
cumbersome to show data flows. Instead, object 
descriptions for each object on a diagram provide 
details of the data flow. 

An object description includes a list of all operations 
provided by an object and, for each arrow leaving the 
object, a list of operations used from another object. 
We can identify the operations provided and used by 
each object in terms of the specified data flow and 
the designed control flow. The object description can 
be produced by matching data flows to operations. 
For example, the description for the ATTITUDE 
DYNAMICS object in figure 12 might be: 

Provides: 

procedure Initialize; 

procedure Integrate (For Duration: in DURATION); 

procedure Apply (Torque: in VECTOR); 
function Current_Attitude return ATTITUDE; 
function Current_Angular_VeIocity 
return VECTOR; 

Uses: 

5.0 LINEAR ALGEBRA 
Add (Vector) 

Dot 

Multiply (Scalar) 

Multiply (Matrix) 

6.0 PARAMETER DATABASE 
Get 

We could next proceed to refine the objects used in 
figure 12 and recursively construct lower level object 
diagrams. These lower level designs must meet the 
functionality of the system specification and provide 
the operations listed in the object description. The 
design process continues recursively until the entire 
system is designed and all objects are completely 
decomposed. 

The GRODY design of figure 4 is basically the same 
as the example design of figure 12. However, the 
GRODY team chose to simplify the design by 
combining the ATTITUDE DYNAMICS and 
SPACECRAFT HARDWARE objects into a single 
TRUTH MODEL subsystem object , similar to the 
corresponding subsystem in the FORTRAN design. 


Further, in GRODY, the LINEAR ALGEBRA 
functions are part of a UTILITIES module not shown 
in figure 4. 

( i l 

SIMULATION ^ 

CONTROL 



FIGURE 12 Attitude Dynamics Simulator Design 
3.5 Implementation 

The transition from an object diagram to Ada is 
straightforward. Package specifications are derived 
from the list of operations provided by an object. For 
the ATTITUDE DYNAMICS object the package 
specification is: 

package Attitude_Dynamics is 

subtype ATTITUDE is Linear_Algebra.MATRIX; 

procedure Initialize; 
procedure Integrate 

( For Duration : in DURATION ); 

procedure Apply 

( Torque : in Linear Algebra. VECTOR ); 

function Current_Attitude 
return ATTITUDE; 
function Current_Angular_Velocity 
return Linear_Algebra. VECTOR; 

end Dynamics; 
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The package specifications derived from the top level 
object diagram can either be made library units or 
placed in the declarative part of the top level Ada 
procedure. For lower level object diagrams the 
mapping is similar, with component package 
specifications being nested in the package body of the 
composite object. States are mapped into package 
body variables. This direct mapping produces a highly 
nested program structure. Alternatively, some or all 
of these packages can be made library units or even 
reused from an existing library. However, this may 
require additional packages to contain data types and 
state variables used by two or more library units. 
Nevertheless, experience has shown that, to promote 
reusability and reduce the compilation burden, it is 
best to avoid nesting of code (Godfrey 87], though It 
is important to retain leveling in the design. 

The process of transforming object diagrams to Ada 
is followed down all the object diagram levels until 
we reach the level of implementing individual 
subprograms. Low-level subprograms can be designed 
and implemented using traditional functional 
techniques. They should generally be coded as 
subunits, rather than being embedded in package 
bodies. 

The clear definition of abstract interfaces in an 
object-oriented design can also greatly simplify 
testing. When testing an object, there is a well 
defined "virtual language" of operations it requires 
from objects at a junior level of abstraction, some of 
which may be stubbed-out for initial testing. Further, 
object-oriented composition encourages Incremental 
integration testing, since the "unit testing" of a 
composite object really consists of "integration 
testing" the component objects at a lower level of 
abstraction. 

17 Reuse 

The concept of generic objects provides a powerful 
tool for reusability. Generic parameters may be used 
to cut the dependencies of a general object on other 
specific objects, allowing the general object to be 
reused in similar but different contexts. Consider, 
for example, a general numeric integrator with the 
following package specification: 


generic 

type REAL is digits o\ - 
type ST aTE_VECTOR is 
array (INTEGER range o) of REAL; 
with function State__Derivative 
( T : DURATION; — fr om reference time 
X : STATE_VECTOR ) 
return STATE VECTOR; 

package Generic_Integrator is 

procedure Integrate 
( For_Duration : in DURATION ); 
function Current_State 
return STATE__ VECTOR; 
procedure Initialize 

end Generic_Jntegrator; 

This package provides the ability to numerically 
integr ate a vector differential equation with an- 
arbitrary state vector size. The "Integrate" procedure 
can be implemented as a vector equation, or as a set 
of individual real-valued functions. To implement it 
as a single vector equation we will need the 
operations provided by a LINEAR ALGEBRA object. 
These operations can be incorporated in two ways. 
One possibility Is to make the operations needed into 
generic formal parameters. Another is to have the 
body of the integrator depend directly on LINEAR 
ALGEBRA. 

Each of these methods has advantages and drawbacks. 
Using generic formal subprograms enhances 
reusability by making the component self-contained, 
but if too many are needed the interface becomes 
complex. Depending on LINEAR ALGEBRA within 
the GENERIC INTEGRATOR makes a cleaner 
interface, but couples the generic package to another 
library unit. The GRODY team has used both 
methods. Figure 13 shows the composition of 
GENERIC INTEGRATOR assuming the latter choice. 

Figure 14 shows a use of the GENERIC 
INTEGRATOR in the composition of the 
ATTITUDE DYNAMICS object. The component 
object ATTITUDE INTEGRATOR is an instantiation 
of the GENERIC INTEGRATOR object. The generic 
object is instantiated in figure 14 with the 
ATTITUDE EQUATION subprogram as the generic 
actual parameter. Most of the ATTITUDE 
DYNAMICS operations are shown in figure 14 as 
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component procedures, represented by rectangles. The 
"Integrate" operation, however, is directly inherited 
from the ATTITUDE INTEGRATOR object. 



Figure 13 Generic Integrator Object Composition 

Ada features such as generic packages are useful 
toolSy but language features are not sufficient to 
guarantee high levels of software reuse. What is also 
needed is an approach to specifying and designing 
reusable components. Using an object-oriented 
approach is useful not because object-oriented design 
is essential for reuse, but because the underlying 
concepts are. These crucial concepts are abstraction, 
information hiding, levels of virtual languages (often 
called virtual "machines") and inheritance [Parnas 79, 
Cox 86]. 

Smalltalk’s subclassing [Goldberg 83] provides an 
elegant means of supporting inheritance. Ada does not 
directly support inheritance, but the concept can be 
simulated by using "call-throughs." A call-through is 
a subprogram that has little function other than to call 
on another package’s subprogram. To simulate 
inheritance when implementing the 
Attitude_Dynamics package the subprogram Integrate 
would be respecified in the Attitude^Dynamics 
package, with the subprogram body in 
Attitude_Dynamics calling on the corresponding 
operation from Attitude_Integrator. 



FIGURE 14 Attitude Dynamics Object Composition 


This technique is clearly less elegant than Smalltalk 
subclassing, but it also has advantages. First, Ada 
allows inheritance from more than one object. 
Second, Smalltalk forces the inheritance of all 
operations and data. An operation can be overridden, 
but not removed, from a class. The Ada specification 
of the composite package gives the developer precise 
control over which operations and data items are 
visible or accessible. (See [Seidewitz 87] for a more 
detailed discussion of Ada and the concept of 
inheritance.) 


4. Conclusion 

The GRODY project has provided an extremely 
valuable experience in the application of object- 
oriented principles to a real system. This experience 
guided the creation of the GOOD methodology which 
is now being used on an increasing number of 
projects inside and outside of the Goddard Space 
Flight Center. As with any pilot project, some of the 
major products of GRODY are the lessons learned 
along the way. 

As part of the GRODY project, a detailed assessment 
has been made of the team’s experiences during 
design [Godfrey 87]. At this time, however, most of 
the observations must remain qualitative. 
Nevertheless, it is clear that the GRODY design is 
significantly different from previous FORTRAN 
simulator designs {Agresti 86], 


13 


5207 


4-21 









General Object-Oriented Software Development with Ada 


It also became clear during the GRODY project that 
the GOOD methodology does not fit comfortably into 
the traditional life cycle management model. At the 
very least, the design phase should be extended and 
design reviews should occur at different points in the 
life cycle. The preliminary design review should 
occur later in the design phase and sh ould include 
detailed object diagrams for the upper levels of the 
system, perhaps down to the level at which the design 
becomes more procedural than obj ect-o riented. The 
critical design review would then include the detailed 
procedural designs, perhaps using an Ada-based 
design language. This review might actually take 
place as a series of incremental r eviews of different 
portions of the design. This later approach is 
supported by the well-defined modularity of an 
object-oriented design. 

The traditional functional viewpoint provides a 
comprehensive framework for the entire software life 
cycle. This viewpoint reflects the action-oriented 
nature of the machines on which software is run. 
The object-oriented approach discussed here can also 
provide a comprehensive view of the life cycle. The 
object-oriented viewpoint, however, reflects the 
natural structure of the problem domain rather than 
the implicit structure of our hardware. Thus, it 
provides a "higher-level” approach to software 
development which decreases the distance between 
problem domain and software solution. By making 
complex software easier to understand, this simplifies 
both system development and maintenance. 
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Abstract 

We need to understand the effects that introducing 
Ada has on the software development environment. 
This paper is about the lessons learned from an ongoing 
Ada project in the Flight Dynamics division of the 
NASA Goddard Space Flight Center. It is part of a 
series of lessons learned documents being written for 
each development phase. 

FORTRAN is the usual development language is 
this environment. This project is one of the first to use 
Ada in this environment. The experiment consists of 
the development of two spacecraft dynamics simulators. 
One is done in FORTRAN wi(,h the usual development 
techniques, and the other is done with Ada. The Ada 
simulator is 135,000 lines of code (LOG), and the FOR- 
TRAN simulator is 45,000 LOC. 

We want to record the problems and successes 
which occurred during implementation. Topics which 
will be dealt with include (l) use of nesting vs. library 
units, (2) code reading, (3) unit testing, and (4) lessons 
learned usiog special Ada features. 

It is important to remember that these results are 
derived from one specific environment; we must be very 
careful when extrapolating to other environments. 
However, we believe this is a good beginning to a better 
understanding of Ada use in production environments. 


A da is * trade mirk of the U.S. Department of Defense - Ada Joint 
Program Office. 

Contact: Carolyn Brophy, Department, of Computer Science, 
University of Maryland, College Park, MD 207-42, (301) -4S4- 
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Ada incorporates many software development con- 
cepts; it is much more than ”just another language”. 
As such, we need to understand the effects of introduc- 
ing Ada into the software development environment. 
This paper concentrates on the lessons learned from an 
ongoing Ada project in the Flight Dynamics Division of 
the NASA Goddard Space Flight Center (GSFC). The 
Ada project is sponsored by the GSFC Software 
Engineering Laboratory (SEL). It is part of a series of 
lessons learned documents being written for each 
development phase. 

Environment 

FORTRAN is the usual development language in 
this environment. The flight dynamics applications 
involve mission analysis and spacecraft orbit and atti- 
tude determination and control. Many of the software 
development projects are similar from mission to mis- 
sion providing, for example, an attitude ground support 
system or an attitude dynamics simulator. This pattern 
of developing similar applications is Important for 
domain expertise and for the legacy developed in this 
environment for code, designs, expectations and intui- 
tions. The similarity between projects allows a high 
level of reuse of both design and code. Since the 
problems are basically familiar ones, the development 
methodologies which involve much iteration do not seem 
to be necessary. The waterfall development model is 
basically used here, and seems to work well in this case. 
Lessons learned from the initial uses of Ada do not 
include changing this basic methodology. 

Project 

The project was originally designed as a parallel 
study with two teams. Each would develop a spacecraft 
dynamics simulator, one with FORTRAN as the imple- 
mentation language, and one with Ada as the implemen- 
tation language. The specifications for each simulator 
were the same, supporting the upcoming Gamma Ray 
Observatory (GRO) mission. However, there are many 
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other differences between the projects which keep the 
study from being truly "parallel”. The FORTRAN ver- 
sion was the production version, thus they had schedul- 
ing pressures the Ada team did not have. Without 
scheduling pressures, the Ada team made enhancements 
in their version not required by the specifications, which 
increased lime spent on the project. This was also the 
first time any of these team members had done an Ada 
project while the FORTRAN team was quite experi- 
enced with the use of FORTRAN. The Ada team 
required training in the language and development 
methodologies associated with Ada, while the FOR- 
TRAN team did things in the usual way (McGarry, 
Page et ah 83]. The Ada team also experimented with 
various design methodologies; this was necessary to find 
which ones would work better for this development 
environment. The FORTRAN team was working with 
a mature and stable environment. In switching to Ada, 
the legacy of reuse for design, code, intuitions and 
experience are gone, and will be rebuilt slowly in the 
new language. 

The philosophies of development were quite 
different between the two projects. The Ada team con- 
sistently applied the ideas of data abstraction and infor- 
mation hiding to their design development. The FOR- 
TRAN development used structural decomposition 
methods. 

Our goals with this project include: 

(1) How is the use of Ada characterized in this 
environment? 

(2) How should the existing development process be 
modified to best changeover from FORTRAN to 
Ada? 

(3) What problems have been encountered in 
development? What ways have we found to deal 
with them? 

Current Project Status 

Both the FORTRAN and Ada teams started in 
January, 1085. The Ada team began with training in 
Ada, while the FORTRAN team immediately began 
requirements analysis. The FORTRAN team delivered 
its product (45K) after completing acceptance testing in 
June, 1087. The Ada team is scheduled to finish system 
testing its 135K product in February, 1088. Discussions 
of the product size differences and effort distributions 
are presented in [McGarry, Agresti 88]. 

The lessons learned from major phases in the Ada 
development are being recorded in a series of SEL 
reports: Ada training [Murphy, Stark 85], design [God- 
frey, Brophy 87], and implementation [in preparation]. 
This paper presents some of the main results from the 
implementation (code and unit test) lessons learned. 


Lessons Learned 

1. Nesting vs. Library Units 

1.1 The flat structure produced 6y ujtny library uni Is has 
advantages over a heav\ly nested structure 

Nesting has many effects on the resulting product. 
The primary advantage of nesting is that it enforces the 
principle of information hiding structurally, because of 
the Ada visibility rules. Whereas with library units, the 
only way to avoid violations of information hiding is 
through self-discipline. In addition, the dot notation 
tells the package where a module is located. 

There arc quite a few disadvantages to nesting, 
however. Nesting makes reuse more difficult. A second 
dynamics simulator in Ada is now being -developed 
which can reuse up to 40% of the Ada project’s code. 
But in order to reuse it, the nested code has to be 
unnested, since the new application only needs some of 
the nested units. This is often a labor intensive opera- 
tion. Nesting also increases the amount of recompila- 
tion required when changes are made, since Ada 
assumes dependencies between even sibling nested 
objects/procedures, even when the dependency is not 
really there. This requires more parts of the system to 
be recompiled than is necessary when more library units 
are used. It is also harder to trace problems back 
through nested levels than it is through levels of library 
units. There is no easy way to tell where a unit of code 
was called from, when it is nested. But library units 
have the ’’with" clauses to identify the source of a piece 
of code. For this reason it is now believed that over use 
of nesting at the expense of using thore library units 
makes maintenance harder. This is contrary to the 
team's earlier expectations. The team had used nesting 
successfully before on a 5000 lines of code training pro- 
ject. However, this kind of approach does not scale-up 
well when developing large projects. 

Library units seem to have a lot of advantages. 
Besides fewer recompilations when changes are made, 
and easier unit testing, every library unit can easily be 
made visible- to any other library unit merely by use of 
the "with" clause. In nested units this visibility does not 
exist, and a debugger becomes essential to see what is 
happening at the deeper levels that are not within the 
scope of the test driver. Library units allow smaller 
components, smaller files, smaller compilation units, and 
less duplication of code. The system is more maintain- 
able, since it is easier to find the unit desired. Reuse 
with library units is also easier, since the parts of the 
system are smaller. Configuration control is also easier 
with library units since more pieces are separate (i.c., 
the ratio of changes to code segments modified is closer 
to 1). The major disadvantage seems to be that a com- 
plicated library' structure develops, which can lead to 
errors by the developers. However, if the Ada project 
were to be done over- now, the team would use more 
library units, and nest less. 
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Advantages and Disadvantages of 
Nesting vs. Library Units 


Advantages 


NESTING 

Disadvantages 


• information hiding 

• visibility control 

• type declarations in 

one place 


* enlarged code 

* more recompilations 

* harder to trace problems 

through nested 
levels 

* can't easily tel! where a 

unit of code called 
from _ 

* type declarations in one 

place means problems 
for reuse 

* harder maintenance 

* debugger required 

* larger unit sizes 

inhibit code reading 
4 harder to reuse part of 
the system 


LIBRARY UNITS 


Advantages 

* fewer recompilations 

* easier unit testing 

* smaller components 
4 smaller files 

4 smaller compilation units 
4 less code duplication 
4 easier maintenance 
4 "with" clauses show source 
of other code units used 
4 easier reuse 

4 easier configuration control 


Disadvantages 

* no information hiding 

• complex library structure 


1.2 The balance between nesting and library units is an 
important implementation issue, not a design issue 

The issue of whether to use library units or nested 
units first arises in the design phase. At least this Is the 
cms€ if it is assumed that the design documents reflect 
this aspect of implementation (i.e. f the design docu- 
ments indicate in some way when nesting is intended vs. 
when library units should be used). While it is 
appropriate for the design to show dependencies, these 
should not dictate implementation, as far as the library 
unit/nesting question is concerned. The team con- 
sidered the decisions concerning nesting/library units to 
be an implementation issue. 


The library units in the Ada project went down 
about 3 to 4 levels, while nesting went down many lev- 
els below that. Another view of the system shows the 
Ada project had 124 packages and 55 library units. 
During implementation most team members felt an 
appropriate balance had been reached between nesting 
levels and number of library units. However, in retros- 
pect, several felt the nesting had been overdone. 
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1.3 It appear s best to use library units at least doum to 
the subsystem level, and nesting at lower levels where 
there is minimal interaction among a small number oj 
modules 

Experiences with unit testing seem to indicate that 
library units should at least go down to the subsystem 
level. This makes testing easier. Below this level the 
benefits of nesting sometimes become too important to 
ignore. This is one heuristic which could be used to 
help determine when the transition from library units to 
nested units should occur. 

An additional way to determine when the transi- 
tion should occur is to examine the degree of interaction 
between pieces. For modules which interact heavily, 
library units are preferred. At the point where the 
interaction drops off, using nested units is preferable. 
Sections with nested code are easier to deal with when 
they are small. 

1.4 In mapping design to code, caution should be used in 
applying too rigorous a set of rules for visibility control. 

In an attempt to control visibility, two features 
appear to have been too rigorously applied. The first 
feature is nesting. The design of the Ada project 
seemed to suggest a particular nesting implementation. 
But this created many objects within objects yielding a 
high degree of nesting. The second way to control visi- 
bility is through the use of many “call-through* s (a pro- 
cedure whose only function is to call another routine). 
“Call-through M s were used to group appropriate pieces 
together exactly as represented in the design. They can 
be implemented via nesting or library units. Faithful- 
ness to the design structure was maintained this way. 

The design had non-primitive objects with specific 
operations. These objects were implemented as pack- 
ages. To put the specific operations (subprograms) into 
the objects (packages) the team used “caJl-through”s. 
Thus a physical piece of code was created for every 
object in the design. “Call-through”s are one of the 
reasons for the expanded code in the Ada project when 
compared to the FORTRAN version. It is estimated 
that out of the I35K LOG making up the Ada system, 
22K LOC (specifications and bodies) are because of 
“ca]l-through M s. While “call-tbrough”s provide a good 
way to collect things into subsystems, these should be 
limited to only two or three levels in the future. 

If the implementation were to be done over now, 
many of the existing “caM-through”s would be elim- 
inated. Instead of creating actual code to correspond 
with every object in the design, some objects in the 
design would remain “logical objects”. No actual pack- 
ages would exist; instead, a logical object would be 
made up of a collection of lower level objects. 

2. Code Reading 

Code reading is generally done with unit testing. 
The developer doing the code reading is not the one 


who developed the code. Comments are returned to the 
origtnaJ developer. After code reading and unit testing, 
the unit is put under configuration control. 

2.1 Code reading helps in training people to use Ada. 

Besides helping to find errors, code reading has. the 
benefit of increasing the proficiency of team members in 
Ada. Individuals can see new ways to handle the algo- 
rithms being encoded. Code reading also allows another 
person besides the original developer to understand a 
given part of the project. This insight should help 
understanding and lead to better solutions of problems 
in the future. 

2.2 Code reading helps isolate style and logic errors. 

The most common errors found in code reading 
with Ada were style errors. The style errors involved 
adding or deleting comments, format changes, and 
changes to debug code (not left in the final product). 
Other types of errors found are initialization errors, and 
problems with incompatibilities between design and 
code. This can be due to either a design error or a cod- 
ing error. 

Because the Ada compiler exposes many errors not 
exposabie by a FORTRAN compiler, code reading Ada 
has a different flavor than code reading FORTRAN. 
For example, the Ada compiler exposes such errors as 
(1) wrong data types, (2) call sequencing errors, (3) vari- 
able errors— either the variable is declared and never 
used, or it is used without beiDg declared. So, one 
seasoned FORTRAN developer working on the Ada pro- 
ject noted that code reading is more interesting in FOR- 
TRAN, since there were more interesting errors found in 
code reading FORTRAN, not found in reading Ada 
code. In general, logic errors are hard to find in this 
application domain, but enough logic errors are found to 
make code reading worthwhile. 

Some of the difficulty of code reading with Ada on 
this project was due to the heavy nesting and the 
number of “call-through” units. Code reading would 
have been helped by a flatter implementation. The 
SEPARATE facility makes it necessary to look in many 
places at once to follow the code. However, code read- 
ing in Ada was easier than in FORTRAN because the 
code was more English-like, and hence, more readable. 
Often the reused FORTRAN code is an older variety 
without the structured constructs available in later ver- 
sions. 

Code reading tended to miss errors that spanned 
multiple units. This would be expected with any imple- 
mentation language as well. One example was a prob- 
lem where records were skipped when they were being 
output. The debugger actually found the problem. 

Despite the implementation language, code reading 
appears to be important for highly algorithmic routines. 
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Groups of routines that are used only to call others may 
be checked to make sure the design's purity is main- 
tained. 

3. Unit Testing 

5.1 Unit testing was found to be harder with Ada than 
with FORTRAN. 

The FORTRAN units are already relatively iso- 
lated; this makes unit testing easy. Only the global 
COMMONS need to be added to do the unit tests. On 
the other hand, the Ada units require a lot of "with'd 
io* code, aod are much more interdependent. Another 
very different Ada project had perhaps even more inter- 
dependence between its modules than the Ada project 
did. That team also found the interdependence made 
unit testing very difficult. More interdependence exists 
between Ada units because there are more relations to 
express in Ada. There are textual inclusion (nesting), 
with-ing in (library units), and invocation. FORTRAN 
only has invocation. 

3.2 The introduction of Ada as the implementation 
language changed the unit testing methods dramatically. 

Unit testing with Ada was done very differently. 
Since one unit depends on many others, it is usually 
hard to test a unit in isolation, so this was generally not 
done. The Ada pieces were integrated up to the pack- 
age level, and then uoit testing was done. Then testing 
was done with groups of units that logically fit together, 
rather than actual unit testing. The integrated units 
are tested, choosing only a subset of possible paths at a 
time. The debugger is used to look at a specific unit, 
since the test drivers cannot "see" the nested ones. 
With Ada projects a debugger becomes essential. This 
is in contrast to the usual development in FORTRAN 
where no integration occurs at all until after unit test- 
ing. 

This shows that the biggest difference between the 
way FORTRAN and Ada projects are done at this point 
in development is incremental integration. This actu- 
ally represents a change in the development lifecycle of 
an Ada product; integration and unit testing are alter- 
nately dooe rather than finishing unit testing before 
integration. 

3.3 The library unit/ nesting level issue directly affects 
the difficulty of unit testing. 

The greater the nesting level, the more difficult 
unit testing is, since the lower level units in the subsys- 
tem are not in the scope of the test driver. This is the 
primary reason a debugger becomes a required testing 
aid with Ada projects. For this reason, more library 
units and less nesting would have made testing easier. 
Library units have to go down to a level in the design 
that makes testing more feasible. With the Ada project 
that would have meant taking library units down to a 
lower level in the design, if the project were to be done 
over. 


Two other ways to deal with the nesting during 
unit test were tried and were not very successful. One 
solution pulls an inner package out, and includes the 
types and "with'd in* modules the outer package used 
in order to execute the inner one. This is difficult to do 
for each unit. The other solution is to modify the 
specifications of the outer package so that nested pack- 
ages can be "seen" by the test driver. This solution 
requires lots of recompilation. With more library units, 
there would be less recompilation, because there would 
be fewer ebanges of specifications. Again however, the 
best w&y io test was to use the debugger on unaltered 
code. 

3.4 The importance of unit testing seems to 6c more 
related to application area than to implementation 
language. '■ r - .™! - : **-■-*■* 

Whether the implementation is in FORTRAN or 
Ada, does not seem as important as whether the appli- 
cation has lots of c alc ulations or has lots of data mani- 
pulations. Unit testing seemed more valuable with 
scientific applications; perhaps because calculation errors 
show up when only a small amount of localized code is 
executed. But data manipulation errors require more of 
the system to be operating before it is known if errors 
are present. 

4. Use of Ada'a Special Features 

41 Separation of specifications and bodies is quite 
beneficial and easy to implement. 

The team entered the specifications first, whenever 
possible, before the rest of the code. This gave a high 
level view of the system early in the development. 
Another benefit is that this helped clarify the interfaces 
early. Separating the specifications and bodies also 
reduces the amount of recompilation required when 
changes are made. 

4.2 Generics were fairly easy to implement and they 
reduce the amount of code required. 

The only problems encountered were with correct 
compilation of the generics in some cases, due to com- 
piler bugs in an early version of the compiler, rather 
than incorrect code. As Ada matures, this will not be a 
problem at all. 

4 3 Using too many types increases coding difficulty 

The strong typing was very difficult to get used to, 
when one is accustomed to weakly typed languages such 
as FORTRAN. It was easy to create too many new 
types as well. 

Often a brand new type was created with a strict 
range appropriate for one portion of the application. 
Then in other areas where subtypes could have been 
used, the range on the original type was found to be too 
restrictive, so another brand new type was created 
instead lo handle the now situation. Then a whole new' 
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set of operations had to be created as well for the addi- 
tional new type. Next time the team would recommend 
creating a more general new type, and using many 
different subtypes of the original type, rather than 
creating more new types. In this way operations can be 
reused and there are far fewer main types to keep track 
of. Designers need to spend time developing families of 
types that inherit properties from one another. 

The strong typing presented some problems when 
testing -units, though it prevents some kinds of errors, 
also. It was harder to write test drivers that could deal 
with all the types in the units being tested. It was also 
harder to do the I/O, since so many types had to be 
dealt with. 

Testing was difficult to code and test, however, this 
seems due to concurrency in general and Ada 
specifically. 

Tasks were used in the user interface part of the 
project. The user was given many options which made 
the interactions between the tasks of the subsystem 
very difficult to plan and execute correctly. 

It was harder to code tasks from the design than it 
was to code other types of units. However, this is not 
really due to Ada, but rather it is the nature of con- 
currency problems. The language made the use of task- 
ing easier, and encouraged the developers to use tasking 
more than they would have otherwise. The dynamic 
relationships of concurrency cannot be represented in 
the design (termination, rendezvous, multiple threads of 
control). Correctness was very difficult to assure, as is 
usual with these kinds of problems, and deadlock was 
hard to avoid. Functional testing was done, which is 
the usual type in this environment. 

The major problem the developers had was. with 
exceptions. These are no worse with tasking than they 
are with any other program unit, however. 

4.5 Exception handlers have to be coded carefully. 

The key problem with exceptions is deciding the 
best way to handle them. Errors and the sources of 
errors can be hard to find if the exception handlers are 
not coded carefully. Suppose a particular procedure will 
call another unit, expecting some function to be per- 
formed, and certain kinds of data to be returned. If an 
exception is raised and handled in the called unit, and it 
is non-specific for the problem raising the exception 
(e.g., “when others”) , the caller gets control back 
without the required function being performed. But the 
exception was handled and data was returned, so the 
call looks successful. Yet as soon as the caller tried to 
use the data from the routine where the exception was 
raised and handled, it fails. Because of propagation, it 
can be very difficult to trace back the error to the origi- 
nal source of the problem. 


Several members of the team would recommend 
incorporating the way exceptions are to be handled into 
the design, rather than leaving this until implementa- 
tion. Put into the design (l) what exception would be 
raised, (2) where it will be bandied, and (3) what should 
happen. 


Ada Features* 
implementation 


tasking 

ease 

benefit 

+ 

generics 

+ 

++ 

strong typing 

0 

0 

exception 

handling 

0 

+ 

nesting 

+ 

- 

separate 

specs/bodies 

++ 

++ 


* This figure represents a subjective assessment 
based on team member interviews 


Summary 

We have learned several important things about 
four major areas in implementation. There are many 
advantages to using library units, though nesting can 
have its usefulness at some point below the subsystem 
level. Code reading helps train people in Ada, and helps 
to isolate style and logic errors. Unit testing was sub- 
stantially changed by using Ada: the first stages of 
integration often began before unit testing proceeded. 
Some Ada features are quite powerful and should be 
carefully used. 

It is important to remember that these results are 
derived from one specific environment. We must be 
very careful when extrapolating to other environments. 
There are also many questions still left to be answered. 
Studies of this project will continue, and other Ada pro- 
jects are being started. These will help us evaluate the 
effects on longer terra issues such as reuse and maintai- 
nability of the Ada projects. We believe this project is 
a good beginning to a better understanding of Ada use 
in production environments. 
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Abstract 

Though Ada and Modula-2 are not object- 
oriented languages, an object-oriented 
viewpoint is crucial for effective use of their 
module facilities. It is therefore instructive to 
compare the capabilities of a modular language 
such as Ada with an archetypal object-oriented 
language such as Smalltalk. The comparison in 
this paper is in terms of the basic properties of 
encapsulation, inheritance and binding, with 
examples given in both languages. This 
comparison highlights the strengths and 
weaknesses of both types of languages from an 
object-oriented perspective. It also provides a 
basis for the application of experience from 
Smalltalk and other object-oriented languages 
to increasingly widely used modular languages 
such as Ada and Modula-2. 


1. Introduction 

Procedural programming techniques concentrate 
on functions and actions. Object-oriented 
techniques, by contrast, attempt to clearly 
model the problem domain. The designers of 
Simula recognized the attractiveness of this 
concept for simulation and included specific 
constructs for object-oriented programming 
(Dahl 68]. Since then, several programming 
languages have been designed specifically for 
-general-purpose object-oriented programming. 
The archetypal example is, perhaps, Smalltalk 
because the language is structured so completely 
around the object concept [Goldberg 83]. 

Ada* [DOD 83] and Modula-2 [Wirth 83] are 
not designed to be object-oriented 
programming languages. However, they do 
have certain object-oriented features which are 
descendants of Simula constructs. Further, 
object-oriented concepts have become 
extremely popular for design of Ada programs 
(e.g., see (Booch 83]). This paper compares and 
contrasts the strict object-oriented capabilities 
of Smalltalk with the object-oriented features 
of Ada. The comparison is in terms of the 
basic object-oriented properties of 
encapsulation, inheritance and binding. I have 
attempted to keep the main body of the paper 
fairly objective, reserving my more 
judgemental comments for the conclusion. 


*Ada is a registered trademark of the US 
Government (Ada Joint Program Office) 
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2. Encapsulation 

An object consists of some private data and a 
set of operations on that data. The intent of an 
object is to encapsulate the representation of a 
problem domain entity which changes state over 
time. Abstraction deals with how an object 
presents this representation to other objects, 
suppressing nonessential details. The stronger 
the abstraction of an object, the more details 
are suppressed by the abstract concept. The 
principle of information hiding states that such 
details should be kept secret from other objects, 
so as to better preserve the abstraction modeled 
by the object. Both Smalltalk and Ada directly 
support these basic encapsulation concepts for 
objects. In Smalltalk these features are the 
central structure of the language while in Ada 
they are added to a core language of 
ALGOL/Pascal heritage. 


Ada is a strongly typed language, so the type of 
every operation argument and return value must 
be declared. A package specification provides 
enough declarative information for compile- 
time syntax and type checking. Additional 
operation descriptions, such as in the Smalltalk 
protocol, can be provided by comments. Other 
code refers to package operations using a 
qualified name , e.g., "Finances.Receive". The 
package body gives the implementation of the 
package. 

Example 1 — Finances 

Class Finances is a simple class of objects which 
represent financial accounts of income and debt 
(all examples are simplified and adapted from 
[Goldberg 83]). The protocol for this class is: 

Finances cl ass protocol 


In Smalltalk, objects are always instances of a 
class which represents a set of problem domain 
entities of the same kind. All instances of a 
class provide the same interface (set of 
operations) to other objects. A class thus 
represents a single abstraction. The class 
definition provides implementations for each of 
the instance operations ( methods in Smalltalk) 
and also defines the form of the internal 
memory of all instances. 

A Smalltalk method is called by sending a 
message to the object, such as: 

MyFinances receive: 25.50 
The protocol of an object is the set of all 
messages that may be received by the object. A 
class itself has a protocol which usually includes 
a few messages to request creation of instances, 
e.g. "Finances new". Note that protocols are 
not really a part of the Smalltalk language 
proper, but are documentation of the 
abstraction represented by a Smalltalk class. 


instance creation 
InitialBaiance: amount 

Begin a financial account 
with "amount" as the 
amount of money on 
hand. 

new Begin a financial account 

with 0 as the amount of 
money on hand. 

Finances instance protocol 

transactions I 

receive: amount Receive an amount of 

money. 

spend: amount Spend an amount of 

money. 


The basic object-oriented construct in Ada is inquiries 

the package. Unlike Smalltalk, objects can be 

defined directly in Ada without having any cashOnHand 

class. Further, Ada requires the definition of 

the interface of an object separately from the 

implementation of the object. This is done in a 

package specification. Ada uses a more totalReceived 

traditional procedure call syntax for object 

operations. 


Answer the total amount 
of money currently on 
hand. 

Answer the total amount 
of money received so far. 
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totalSpent 


Answer the total amount 
of money spent so far. 

The implementation of the Finances class must 
inc lud e a method for each of the messages in 
the protocol. It also defines the names of a set 
of instance variables which represent the 
internal data of each class instance. The 
instance variables and the implementations of 
the methods are hidden from users of instances 
of the class. In the Smalltalk-80 system, the 
various parts of a class definition are accessed 
through an "interactive system browser." The 
textual description used here is based on the 
one used in [Goldberg 83]. The definition of 
class Finances is: 

class name Finances 

superclass Object 

instance variable names income 

debt 

class methods 

instance creation 

initialBalance: amount 

A super new setlnitialBalance: amount 

new 

A $uper new setlnitialBalance: 0 

instance methods 

transactions 

receive: amount 

income <- income + amount 

spend: amount 
debt <- debt + amount 

inquiries 

cashOnHand 

A income - debt 

totalReceived 
A income 

totalSpent 

A debt 


private 

setlnitialBalance: amount 
income <- amount, 
debt <- 0 

Note that "super new" refers to the system 
method to create a new instance, " A " indicates 
returning a value and indicates assignment. 
Some examples of use of this class are: 

MyFinances <- Finances initialBalance: 500.00. 
MyFinances spend: 32.50. 

MyFinances spend: foodCost + salesTax. 

MyFinances receive: pay.- 

tax <- taxRate * (MyFinances totalReceived) 

The specification for an Ada package Finances 
corresponding to the above Smalltalk protocol 
is: 

package Finances is 
type MONEY Is FLOAT; 

— Initialization 

procedure Set (Balance : in MONEY); 

— Transaction's 

procedure Receive (Amount : in MONEY); 
procedure Spend (Amount : in MONEY); 

-- Inquiries 

function Cash_On_Hand return MONEY; 
function Total_Received return MONEY; 
function Total_Spent return MONEY; 

end Finances; 

The above specification for Finances really does 
not define a complete object in the Smalltalk 
sense. This is because a package is a static 
program module, and cannot be passed around 
as data. For an object to be passed as data in 
Ada it must have a type. A type is analogous to 
a Smalltalk class in that it represents a set of 
objects with the same set of operations and 
internal data. An object type is called a private 
type in Ada because the representation of the 
internal data is hidden. The specification for a 
private type FINANCES is: 
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package Finance_Handler is 

type FINANCES is private; 
type MONEY is FLOAT; 

— Instance creation 

function Initial (Balance : MONEY) 
return FINANCES; 

— Transactions 
.procedure Receive 

( Account : in out FINANCES; 

Amount : in MONEY ); 
procedure Spend 
( Account : in out FINANCES; 

Amount : in MONEY); 

-- Inquiries 

function Cash_On__Hand 
( Account : FINANCES ) 
return MONEY; 
function TotaI__Received 
( Account : FINANCES ) 
return MONEY; 
function Total_Spent 
( Account : FINANCES ) 
return MONEY; 

private 

type FINANCES is 
record 

Income : MONEY := 0.00; 

Debt : MONEY := 0.00; 

end record; 

end Finance_Handler; 

Private types must be defined within packages. 
Package Finance^ Handler specifies each of the 
operations on objects of type FINANCES, while 
the type itself defines the internal data for each 
object. The private part of the package 
contains the definition of type FINANCES in 
terms of other Ada type constructs. In this 
case, objects of type FINANCES are effectively 
declared to have two instance variables, as in 
the Smalltalk example. (The private part of a 
package is logically part of the package 
implementation, not the specification. It is 
included in the specification only so that the 
compiler can tell from the specification alone 
how much space to allocate for objects of 
private types.) The package Finance_Handler is 


in some ways similar to the metaclass of the 
Smalltalk class Finances, In Smalltalk, a 
metaclass is the class of a class. Both the 
metaclass and the handler package provide a 
framework for the definition of a class, and 
they also allow for the definition of class 
variables and class operations. 

Since the declaration of instance variables is in 
the private part of the specification of 
Finance_Handler, the package body only needs 
to define implementations for each of the 
specified operations: 

package body Finance_Handler is 

-- Instance creation 
function Initial (Balance : MONEY) 
return FINANCES is 
begin 
return 

( Income => Balance, 

Debt => 0.00 ); 

end Finance__Handler; 

— Transactions 
procedure Receive 

( Account : in out FINANCES; 

Amount : in MONEY ) is 
begin 

Account.Income := Account. Income 
+ Amount; 

end Receive; 

procedure Spend 
( Account : in out FINANCES; 

Amount : in MON^Y ) is 
begin 

Account.Debt := Account. Debt 
+ Amount; 

end Spend; 

— Inquiries 

function Cash_On_JrIand 
( Account : FINANCES ) 
return MONEY is 
begin 
return 

Account.Income - Account.Debt; 
end Cash On Hand; 
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function Totai_Received 
( Account : FINANCES ) 
return MONEY is 
begin 

return Account.Income; 
end Total_Received; 

function Total_Spent 
( Account : FINANCES ) 
return MONEY is 
begin 

return Account.Debt; 
end Total_Spent; 

end Finance_Handler; 

Each FINANCES operation explicitly includes 
an Account of type FINANCES as one of its 
parameters. The instance variables of an 
Account are then accessed using a qualified 
notation such as "Account. Income". This access 
to instance variables is only allowed within the 
body of package Finance_Handler. Some 
examples of the use of type FINANCES are: 

declare 

My_Finance$ 

: Finance_Handler. FINANCES 
> Finance_Handler.InitiaI 
(Balance »> 500.00); 

begin 


Finance_Handler.Spend 
( Account *> My_Finances, 

Amount => 32.50 ); 
Finance_Handler,Spend 
( Account => My_Finances, 

Amount => Food_Cost + Sales_Tax ); 
Finance_Handler. Receive 
( Account => My_Finances, 

Amount => Pay ); 

Tax := Tax_Rate 

* Finance_Handler.Total_Received 
(My_Finances); 

end; 

Packages in Ada allow the definition of objects 
as program modules or the definition of classes 
as private types. Packages cannot themselves be 
passed as data, but the instances of private 
types can. It is also possible in Ada to define 


classes of objects which cannot be passed as 
data. This is done using a generic package 
which serves as a template for instances of the 
class. For example, the earlier specification for 
package Finances can be made generic by 
simply adding the keyword generic at the 
beginning: 

generic 

package Finances is 


end Finances; 

Other packages can then be declared as 
instantiations of the generic package. For 
example: 

declare 

package My_Finances is 
new Finances; 

begin 

My_Finances.Receive (Amount => Pay); 

Cash := My_Finances.Cash_On_Hand; 

end; 

I will have more to say later on other important 
roles of generics in Ada. 

3. Inheritance 

A class represents a common abstraction of a set 
of entities, suppressing their differences. At a 
lower level of abstraction, some entities may 
differ from others. A subclass represents a 
subset of the entities of a class. A subclass 
inherits general abstract properties from its 
superclass , defining only the specific 
differences which appear at its lower level of 
abstraction. This technique of subclass 
inheritance allows the incremental building of 
application-specific abstractions from general 
abstractions. 

Smalltalk directly supports the concept of 
subclassing and inheritance. In Smalltalk every 
class has a superclass, except for the system 
class Object which describes the similarities of 
ail objects. Instances of a subclass are the same 
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as instances of the superclass except for 
differences explicitly stated in the subclass 
definition. The allowed differences are the 
addition of instance variables, the addition of 
new methods and the overriding of superclass 
methods. An instance of a subclass will 
respond to at least all of the same messages as 
instances of its superclass, though not 
necessarily in exactly the same way. 

Ada does not provide direct support for 
subclassing or inheritance. However, the 
concept of inheritance can be used profitably 
within Ada, in some ways more generally than 
in Smalltalk. When defining a subclass in Ada, 
it is still necessary to declare all operations of 
that subclass, even those inherited from a 
superclass. Thus the specification of a subclass 
package will include all the operations of the 
superclass and possibly some additional ones. 
(This also results in a hiding of the use of 
inheritance reminiscent of the discussion in 
[Snyder 86].) In the body of the subclass 
package, inherited operations must be 
implemented as call-throughs to the operations 
of the superclass. 

Example 2 — Deductible Finances 

The class DeductibleFinances is a subclass of 
the Finances class of Section 2. Instances of 
DeductibleFinances have the same functions as 
instances of Finances for receiving and 
spending money. However, they also keep 
track of tax deductible expenditures. The 
definition of DeductibleFinances specifies one 
new instance variable, four new instance 
methods and overrides two class methods: 

class name DeductibleFinances 

superclass Finances 

instance variable names deductibleDebt 

class methods 

instance creation 

initialBalance: amount 

A (super initialBalance: amount) zeroDeduction 

new 

A $uper new zeroDeduction 


instance methods 
transactions 

spendDeductible: amount 

self spend: amount deducting: amount. 

spend: amount deducting: deductibleAmount 
super spend: amount. 
deductibleDebt <- deductibleDebt 

+ deductibleAmount 

inquiries 

totalDeduction 

A deductibleDebt 

private 

zeroDeduction 
deductibleDebt <- 0 

Note that sending a message to "self" results in a 
call on one of an object’s own methods, while 
sending a message to "super" results in a call on 
one of the methods of the superclass Finances. 

Now consider an Ada type which defines a 
subclass of the FINANCES type of Section 2: 

with Finance_Handler; 

package Deductible Finance^ Handler is 

type DEDUCTIBLE_FINANCES is private; 
subtype MONEY is 
Finance Handler.MONEY; 

! 

-- Instance creation 
function Initial ( Balance : MONEY ) 
return DEDUCTIBLE_FINANCES; 

-- Transactions 
procedure Receive 

( Account : in out DEDUCTIBLE_FINANCES; 
Amount : in MONEY ); 
procedure Spend 

( Account : in out DEDUCTIBLE_FINANCES; 
Amount : in MONEY; 

Deductible_Amount : in MONEY := 0.00 ); 

procedure Spend Deductible 

( Account : in out DEDUCTIBLEFINANCES; 
Amount : in MONEY ); 
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-- Inquiries 

function Cash_On_Hand 
( Account : DEDUCTIBLE_FINANCES ) 
return MONEY; 
function Total_Received 
( Account : DEDUCTIBLE_FINANCES ) 
return MONEY; 
function Total_Spent 
( Account : DEDUCTIBLE_FINANCES ) 
return MONEY; 
function Total_Deduction 
( Account : DEDUCTIBLE_FINANCES ) 
return MONEY; 

private 

type DEDUCTIBLE_FINANCES is 
record 

Finances : Finance_Handler.FINANCES; 
Deductible_Debt : MONEY :* 0.00; 
end record; 

end Finance_Handler; 

Package Deductible_JFinance_Handler has the 
new operations Spend_Deductible and 

Total Deductions, and it has a modified Spend 

operation. The Spend procedure has a 
DeductibIe_Amount parameter with a default 
value of 0.00. 

DEDUCTIBLE_FIN A NCES implements 
inheritance from FINANCES by using the 
instance variable Finances of type FINANCES. 
Inherited operations are then implemented as 
call-throughs to operations on Finances: 

package body Deductible_JFinance_Handler is 

— Instance creation 
function Initial ( Balance : MONEY ) 
return DEDUCTIBLE_FINANCES is 
begin 
return 

( Finances => Finance__HandIer.Initial(BaIance), 
Deductible_Debt => 0.00 ); 

end Initial; 


— Transactions 
procedure Receive 

( Account : in out DEDUCTIBLE FINANCES; 
Amount : in MONEY ) is 
begin 

— INHERITED — 
Finance^Handler.Receive 
( Account => Account.Finances, 

Amount *> Amount ); 

end Receive; 

procedure Spend 

( Account : in out DEDUCTIBLE_FINANCES; 
Amount : in MONEY; 

Deductible^ Amount : in MONEY :« 0.00 ) is 
begin 

Finance_Handler.Spend 
( Account => Account.Finances, 

Amount «> Amount ); 

Account.DeductibIe_Debt 

:«* Account.Deductible_Debt 

+ Deductible_Amount; 

end Spend; 

procedure Spend_Deductible 
( Account : in out DEDUCTIBLE_FINANCES; 
Amount : in MONEY ) is 
begin 
Spend 

( Account — > Account, 

Amount => Amount, 

Deductible Amount *> Amount ); 

end Spend_DeductibIe; 

— Inquiries 

function Cash_On_Hand 
( Account : DEDUCTIB(LE_FINANCES ) 
return MONEY is 
begin 

— INHERITED — 

return Finance_HandIer.Cash_On_Hand 

(Account.Finances); 

end Cash_On_Hand; 


function TotaI_Deductions 
( Account : DEDUCTIBLE_FINANCES ) 
return MONEY is 
begin 

return Account.Deductible_Debt; 
end TotaI__Deductions; 

end DeductibIe_Finance_HandIer; 
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Unlike Smalltalk, implementing inheritance in 
Ada requires an extra level of operation call. 
Also, in Ada the subclass does not have direct 
access to the instance variables of the 
superclass. The superclass package presents the 
same abstract interface to subclass packages as 
to any other code. This tightens the 
encapsulation of the superclass abstraction. It 
also allows easy extension to multiple 
inheritance where a subclass may inherit 
operations from more than one superclass. 
Multiple inheritance simply requires multiple 
superclass instance variables with inherited 
operations calling- through to the appropriate 
superclass operations. In this case the new class 
is really a composite abstraction formed from 
more general component classes. 

The main drawback of this approach is that the 
Ada typing system does not recognize 
subclassing. In Ada all private types are 
distinct. Even though the type 
DEDUCTIBLE_FINANCES is logically a 
sub class of type FINANCES, the type 
DEDUCTIBLE_FINANCES is not a sub tvoe of 
type FINANCES. It is not possible, for 
instance, to pass an instance of type 
DEDUCTIBLE_JFINANCES to a procedure 
expecting an argument of type FINANCES. 
The Ada compiler would see this as a type 
inconsistency. A partial solution to this 
involves the use of the Ada generic facility, and 
will be discussed later in Section 4. However, 
the problem cannot be fully overcome in Ada, 
and [Meyer 86] clearly shows that true 
inheritance is more powrful than genericity. 

4. Binding 

The Smalltalk message passing mechanism 
operates dynamically. When a message is sent 
to a Smalltalk object, the method to respond to 
that message is looked-up at run-time in the 
object’s class (and possibly superclasses). 
Further, Smalltalk variables are not typed, so 
they may contain objects of any class. Thus it 
is generally not possible to determine statically 
exactly what method in what class will respond 
to a message. Messages are dynamically bound 
to methods at run-time. If an object cannot 
respond to a message, there is a run-time error. 

The use of dynamic binding gives the 
programmer great freedom to create general 


code. Any object can be used in an instance 
variable or as an argument in a message as long 
as it can respond to the messages sent to it. 
Another use of dynamic binding in Smalltalk is 
with the "pseudo-variable” "self” which is used 
by an object to send messages to itself. When a 
message is sent to an object, "selP is set to the 
object to which the message is sent. The 
dynamic binding of messages sent to "selP 
allows a class to call on methods that are really 
defined in a subclass. 

Unlike Smalltalk, Ada is a strongly typed 
language. This means that all variables and 
parameters must be declared to be of a single 
specific type. This allows an Ada compiler to 
check statically that only values of the correct 
type are being assigned to variables and used as 
arguments. The Ada compiler can also always 
determine exactly what operation from what 
package (if any) is being invoked by a given 
call. Operation calls are thus statically bound to 
the proper operation. Undefined operation calls 
are always discovered at compile-time. 

A way around this involves the use of generics. 
In addition to their role in creating classes of 
packages, generics also allow a package to be 
parameterized with type and subprogram 
parameters. This feature can be used to declare 
a package which can use any class with certain 
needed operations. Generic facilities can also 
be used to allow a class to defer the 
implementation of some operations to 
subclasses. 

Example 3 -- Salmple Space 

The class SampleSpace represents random 
selection without replacement from a collection 
of items. It has the following protocol: 

SampleSpace class protocol 

instance creation 

data: aColIection Create an instance such 
that aColiection is the 
sample space. 
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SampleSpace instance protocol 


accessing 

next 


next: anlnteger 


Answer the next element 
chosen at random from 
the sample space, 
removing it from the 
space. 

Answer an ordered 
collection of anlnteger 
number of selections 
from the sample space. 


testing 

isEmpty 

size 


Answer whether any 
items remain to be 
sampled. 

Answer the number of 
items remaining to be 
sampled. 


This protocol does not specify exactly what 
kind of collection must be used for the sample 
space. The class definition is: 


class name 
superclass 

instance variable names 


SampleSpace 

Object 

data 

rand 


class methods 

instance creation 

data: aColiection 

A super new setData: aColiection. 

instance methods 

accessing 

next 
I item | 

self isEmpty ifTrue: 

[self error ’no values exist in the sample space’], 
item <- data at: 

(rand next * data size) truncated + I. 
data remove: item. 

A item 


next: anlnteger 

| aColiection | 
aColiection L. 

<- OrderedCollection new: anlnteger. 
anlnteger timesRepeat: 

[aColiection addLast: self next], 

A aCollection 

testing 

isEmpty 
A self size * 0 

size 

A data size 
private 

setData: aColiection 
data <- aColiection. 
rand <- Random new 

Note that local variables in methods are listed 
between vertical bars at. the beginning of the 
method. Also, the definition of SampleSapce 
uses an instance of the Smalltalk system class 
Random to generate random numbers. laJhe 
methods for "next” and "size", SampleSpace 
sends the messages "at:", "size" and "remove:" to 
the instance variable "data" which holds the 
collection of sample space items. This means 
that any object which can respond to "at:", "size" 
and "remove:" can serve as the collection. This 
object could be an instance of a Smalltalk 
system class such as Array, or it could be an 
instance of a user-defined class. An example of 
the use of SampleSpace ii shuffling a deck of 
cards: 

class name CardDeck 

superclass Object 

instance variable names cards 


shuffle 
| sample | 

sample <- SampleSpace data: cards, 
cards <- sample next: cards size 

An Ada generic Sample_Space package needs a 
COLLECTION type and At, Size and Remove 
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operations. A specification for this package is: 
generic 

type COLLECTION_TYPE is private; 
type ELEMENT_TYPE Is private; 

with function At 

( Collection : COLLECTION _TYPE; 

Index : POSITIVE ) 
return ELEMENTJTYPE; 
with function Size 

( Collection : COLLECTION _TYPE ) 
return ELEMENT_TYPE; 
with procedure Remove 
( Collection : in out COLLECTION_TYPE; 
Element : in ELEMENT_TYPE ); 

package Sample Space is 

Empty : exception; 

type ELEMENT^LIST is 
array (NATURAL range <>) 
of ELEMENTJTYPE; 

-- Initialization 
procedure Set 

( Data : in COLLECTION^ TYPE ); 

-- Accessing 

function Next return ELEMENTJTYPE; 
function Next ( Number : NATURAL ) 
return ELEMENTJLIST; 

-- Testing 

function Is_Empty return BOOLEAN; 
function Size return NATURAL; 

end Sample_Space; 

Package Sample^Space uses the generic facility 
both to parameterize itself and to allow a class 
of objects (as discussed in Section 2). It would 
also have been possible to define a generic 
Sample jSpace_Handler package with a 
SAMPLE_SPACE type. This would have 
allowed sample spaces to be passed as data, an 
ability which is not really needed for the 
present example. 

The body of SampIe_Space is: 


with Random; 

package body Sample^Space is 

-- Instance variable 

Sample_Data : COLLECTION ^TYPE; 

— Initialization 
procedure Set 

( Data : COLLECTION_TYPE ) is 
begin 

Sample jData :* Data; 
end Set; 

— Accessing 

function Next return ELEMENTJTYPE is 
Item : ELEMENTJTYPE; 
begin 

if ISjEmpty then 
raise Empty; 
end if; 

Item > At ( Sample jData, Index => 

NATURAL((Random.Value*Size)+l ) ); 
Remove 

( Collection => Sample __Data, 

Element *> Item ); 

return Item; 
end Next; 

function Next ( Number : NATURAL ) 
return ELEMENT_LIST is 
List : ELEMENT jLIST(l .. Number); 
begin 

for I in 1 .. Number loop 
List(I) := Next; 
end loop; 
return List; 

end Next; l 

-- Testing 

function Is_Empty return BOOLEAN is 
begin 

return (Size * 0); 
end Is_Empty; 

function Size return NATURAL is 
begin 

return Size(Sample_Data); 
end Size; 

end Sampie_Space; 

The Sample — Space package body assumes the 
availability of a package Random to generate 
random numbers. Sample Space could then be 
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used to shuffle an instance of private type 
CARD_DECK: 

with Sample^Space; 

package body Card_Deck_Handler is 


package Sample is new Sample Space 

( COL LECTION_TYPE •> CARD_DECK, 

ELEMENT_TYPE -> CARD_TYPE, 

At -> Card, 

Size ■> Deck_Size, 

Remove *> Remove_Card ); 


procedure Shuffle 
( Cards : In out CARD_ DECK ) is 
begin 

Sample.Set (Data *> Cards); 

Cards CARD — DECK 

(Sample. Next(Deck_Size(Cards))); 

end Shuffle; 


end Card_Deck_Handler; 

Generic package Sample__Space is a template 
for a general class of sample spaces. Since a 
COLLECTION_TYPE must be specified when 
Sample_Space is instantiated, each instance of 
this class can only handle a single type of 
collection for sampling. Thus an Ada compiler 
can still perform static type checking for each 
instantiation of generic packages. 

The dynamic binding and lack of typing in 
Smalltalk allow an instance of a subclass to be 
used anyplace an instance of its superclass may 
be used. As mentioned at the end of Section 3 
the Ada type system does not allow this because 
it views all private types as distinct and 
incompatible. The above generic technique can 
help with this problem, also. A generic package 
(or other program unit) which is parameterized 
by the types and operations it needs will be able 
to use any type with the necessary operations. 
Thus if the private type representing some class 
can be plugged into a generic, then a subclass 
type can also be plugged into that same generic. 
However, the generic must be instantiated 
separately for each type. There is no easy way 


in Ada have a true polymorphic procedure, that 
is, a single procedure with an argument which 
accepts values of different types. 

5. Conclusion 

Smalltalk and Ada are based on quite different 
philosophies. Smalltalk is designed to make it 
easier to program and to incrementally build 
and modify systems. Ada, on the other hand, 
purposefully places certain additional 
obligations on the programmer so that the final 
system will be more reliable and more 
maintainable. The Ada philosophy takes a 
much more life-cycle-oriented approach, 
recognizing that most costly phase of software 
development is maintenance, not coding. 

If the languages have such different bases, then 
why consider using object-oriented ideas for 
Ada? The answer is that object-oriented 
concepts really apply to more than just 
programming. In Ada circles, these concepts 
are usually applied to design [Booch 83, 
Seidewitz 86a, Seidewitz 86b], The object- 
oriented viewpoint is crucial to designing for 
effect use of Ada’s package facility. Further, 
the object-oriented approach can be a general 
way of thinking about software systems which 
can be applied from system specification 
through testing. This fits in quite well with the 
Ada life-cycle philosophy [Booch 86, Stark 87]. 

Still, Ada has some unfortunate drawbacks for 
object-oriented programming, especially in its 
lack of support for inheritance. As an object- 
oriented programming language Smalltalk is in 
many ways clearly superior to Ada. However, 
as a life-cycle software engineering language 
Ada has great advantages. ’ Static strong typing 
is crucial to increasing the reliability of 
software. Even with a good testing 
methodology, large amounts of code will not be 
thoroughly tested because it is only executed in 
rare combinations of situations. But when a 
system is running continuously for years, any 
errors that remain in these sections of code will 
almost certainly occur. This is especially true 
for the embedded real-time systems which were 
Ada’s original mandate. In Ada, all sections of 
code are checked by the compiler, and many 
errors can be caught before the testing phase 
due to static type checking and static operation 
binding. 
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It is possible to support inheritance and even 
polymorphism within a statically typed language 
(as in, for example, Eiffel [Meyer 86, Meyer 
87]). Inheritance might be added to Ada 
without too much change to the design of the 
language. Incorporation of polymorphism 
would be much more difficult, and probably 
require a philosophical change in the Ada 
language design. However, even with these 
deficiencies for object-oriented programming, 
Ada still provides a useful vehicle for applying 
object-oriented concepts throughout the 
software development life-cycle. 

Much of the above discussion also applies to 
other modular languages such as Modula-2 
(though Modula-2 does not directly support 
genericity). As these languages become more 
and more widely used it will be increasingly 
important to apply to them the experience in 
object-oriented software development gained 
from Smalltalk and other object-oriented 
languages. 
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